RAM에 있는 대용량 파일의 "정렬" 속도를 높이기 위해 "병렬성"을 사용하는 방법은 무엇입니까?

Question

48개 코어, 500GB의 사용 가능한 RAM이 있고 파일에 1억 행이 있고 메모리에 적합하다고 가정해 보겠습니다.

일반 정렬을 사용하면 속도가 상당히 느려집니다.

$ time sort bigfile > bigfile.sort
real    4m48.664s
user    21m15.259s
sys     0m42.184s

로케일을 무시하면 조금 더 빠르게 만들 수 있습니다.

$ export LC_ALL=C
$ time sort bigfile > bigfile.sort
real    1m51.957s
user    6m2.053s
sys     0m42.524s

더 많은 코어를 사용하도록 지정하여 정렬 속도를 더 빠르게 만들 수 있습니다.

$ export LC_ALL=C
$ time sort --parallel=48 bigfile > bigfile.sort
real    1m39.977s
user    15m32.202s
sys     1m1.336s

sort에 더 많은 작업 메모리를 제공할 수도 있습니다(sort에 이미 충분한 메모리가 있는 경우에는 도움이 되지 않습니다).

$ export LC_ALL=C
$ time sort --buffer-size=80% --parallel=48 bigfile > bigfile.sort
real    1m39.779s
user    14m31.033s
sys     1m0.304s

그러나 단일 스레딩을 많이 수행하는 것을 좋아하는 것 같습니다. 다음을 통해 강제로 더 병렬화되도록 할 수 있습니다.

$ merge() {
    if [ $1 -le 1 ] ; then
        parallel -Xj1 -n2 --dr 'sort -m <({=uq=}) | mbuffer -m 30M;'
    else
        parallel -Xj1 -n2 --dr 'sort -m <({=uq=}) | mbuffer -m 30M;' |
          merge $(( $1/2 ));
    fi
  }
# Generate commands that will read blocks of bigfile and sort those
# This only builds the command - it does not run anything
$ parallel --pipepart -a bigfile --block -1 --dr -vv sort |
    # Merge these commands 2 by 2 until only one is left
    # This only builds the command - it does not run anything
    merge $(parallel --number-of-threads) |
    # Execute the command
    # This runs the command built in the previous step
    bash > bigfile.sort
real    0m30.906s
user    0m21.963s
sys     0m28.870s

파일을 48개 청크(코어당 한 청크)로 동적으로 자르고 해당 청크를 병렬로 정렬합니다. 그런 다음 쌍 중 하나를 병합 정렬합니다. 그런 다음 쌍 중 하나를 병합 정렬합니다. 그런 다음 쌍 중 하나를 병합 정렬합니다. 그런 다음 쌍 중 하나를 병합 정렬합니다. 그런 다음 쌍 중 하나를 병합 정렬합니다. 입력이 하나만 있을 때까지 계속됩니다. 가능하다면 이 모든 작업은 병렬로 수행됩니다.

4G 행이 있는 100GB 파일의 경우 시간은 다음과 같습니다.

$ LC_ALL=C time sort --parallel=48 -S 80% --compress-program pzstd bigfile >/dev/null
real    77m22.255s
$ LC_ALL=C time parsort bigfile >/dev/null
649.49user 727.04system 18:10.37elapsed 126%CPU (0avgtext+0avgdata 32896maxresident)k

따라서 병렬화를 사용하면 속도가 약 4배 정도 향상될 수 있습니다.

더 쉽게 사용할 수 있도록 작은 도구로 만들었습니다. parsort이제 GNU Parallel의 일부가 되었습니다.

sort또한 옵션과 표준 입력 읽기( ) 도 지원합니다 parsort -k2rn < bigfile.

Answer 1