≥1022자 행을 리디렉션하는 것이 ≥1021자 행을 리디렉션하는 것보다 83배 더 오래 걸리는 이유는 무엇입니까?

2024-6-8 • tag-icon

≥1022자 행을 리디렉션하는 것이 ≥1021자 행을 리디렉션하는 것보다 83배 더 오래 걸리는 이유는 무엇입니까?

약 50,000줄이 포함된 약 10MB의 텍스트 파일이 있습니다. 1021바이트 이상의 모든 줄을 선택하고 출력을 일반 파일로 리디렉션하거나 cat으로 파이프하면 0.135초가 걸립니다. ≥1022바이트로 변경했을 때 15.9초가 걸렸습니다. 이는 83배 더 긴 시간입니다. 결과는 동일합니다.

$ time grep '^.{1021,}$' my_file > /tmp/grep1021

real    0m0.135s
user    0m0.120s
sys     0m0.013s
$ time grep '^.{1022,}$' my_file > /tmp/grep1022

real    0m11.483s
user    0m11.036s
sys     0m0.441s
$ cmp /tmp/grep102?
$

그 이후에는 시간 소모가 크게 늘어납니다. 2,100자 이상의 줄에서는 52초가 걸립니다(결과는 여전히 동일합니다).

$ time grep '^.{1200,}$' my_file | cat > /dev/null

real    0m15.903s
user    0m15.182s
sys     0m0.737s
$ time grep '^.{1500,}$' my_file | cat > /dev/null

real    0m27.114s
user    0m24.584s
sys     0m2.545s
$ time grep '^.{1800,}$' my_file | cat > /dev/null

real    0m36.468s
user    0m34.889s
sys     0m1.594s
$ time grep '^.{2100,}$' my_file | cat > /dev/null

real    0m52.164s
user    0m47.949s
sys     0m4.221s

이는 단독으로 발생하는 것이 아니며 grep자체적으로 충분히 빠릅니다.

$ time grep '^.{1022,}$' my_file > /dev/null

real    0m0.073s
user    0m0.060s
sys     0m0.012s
$ time grep '^.{3000,}$' my_file > /dev/null

real    0m0.495s
user    0m0.411s
sys     0m0.084s

왜 이런 일이 발생합니까? 내 생각엔 청킹에 관한 것 같은데 왜 통과하는지 설명할 수 없습니다.더 적은파이프라인으로의 데이터가 필요합니다.많은처리 시간이 길어집니다. 국경은 의심스럽게 1024에 가깝습니다.

시스템은 openSUSE 15.3을 실행하고 Linux 커널은 5.3.18-59.19-default입니다.

추가 정보:

--line-bufferedgrep에 추가해도 아무런 차이가 없습니다
사용하면 awk 'length >= n프로세스가 매우 빠르게 실행됩니다.

$ time grep '^.{1021,}$' my_file | wc
    263   30511 1899841

real    0m0.162s
user    0m0.147s
sys     0m0.031s
$ time grep '^.{1022,}$' my_file | wc
    263   30511 1899841

real    0m11.514s
user    0m11.044s
sys     0m0.487s

$ ulimit -p
8
$ time grep --line-buffered '^.{1021,}$' my_file | cat > /dev/null

real    0m0.137s
user    0m0.120s
sys     0m0.027s
$ time grep --line-buffered '^.{1022,}$' my_file | cat > /dev/null

real    0m11.528s
user    0m10.989s
sys     0m0.547s
$ time awk 'length >= 1021' my_file | cat > /dev/null

real    0m0.044s
user    0m0.041s
sys     0m0.008s
$ time awk 'length >= 1022' my_file | cat > /dev/null

real    0m0.044s
user    0m0.045s
sys     0m0.005s
$ time awk 'length >= 3000' my_file | cat > /dev/null

real    0m0.045s
user    0m0.038s
sys     0m0.012s

관련 정보