명령으로 파이프하기 위해 파일을 분할하는 방법(예: stdout으로 분할)은 무엇입니까?

Question 1

가장 간단한 방법은 다음과 같습니다.

while IFS= read -r line; do
  { printf '%s\n' "$line"; head -n 99; } |
  other_commands
done <database_file

read파일 끝에 도달하면 중지할 수 있는 다른 방법이 없는 것 같으므로 각 섹션의 첫 번째 줄에 사용해야 합니다 . 자세한 내용은 다음을 참조하세요.

Answer

가장 간단한 방법은 다음과 같습니다.

while IFS= read -r line; do
  { printf '%s\n' "$line"; head -n 99; } |
  other_commands
done <database_file

read파일 끝에 도달하면 중지할 수 있는 다른 방법이 없는 것 같으므로 각 섹션의 첫 번째 줄에 사용해야 합니다 . 자세한 내용은 다음을 참조하세요.

Question 2

기본적으로 파일이 아닌 split에 출력을 넣으 려고 합니다 .stdout

액세스 권한이 있는 경우 gnu split이 --filter옵션은 정확히 다음을 수행합니다.

‘--filter=command’

    With this option, rather than simply writing to each output file, write
    through a pipe to the specified shell command for each output file.

--filter따라서 귀하의 경우에는 다음과 같은 명령을 사용할 수 있습니다 .

split -l 100 --filter='{ cat Header.sql; cat; } | sqlcmd; printf %s\\n DONE' infile

또는 다음과 같은 스크립트를 작성하세요 myscript.

#!/bin/sh

{ cat Header.sql; cat; } | sqlcmd
printf %s\\n '--- PROCESSED ---'

그런 다음 간단히 실행하십시오.

split -l 100 --filter=./myscript infile

Answer

기본적으로 파일이 아닌 split에 출력을 넣으 려고 합니다 .stdout

액세스 권한이 있는 경우 gnu split이 --filter옵션은 정확히 다음을 수행합니다.

‘--filter=command’

    With this option, rather than simply writing to each output file, write
    through a pipe to the specified shell command for each output file.

--filter따라서 귀하의 경우에는 다음과 같은 명령을 사용할 수 있습니다 .

split -l 100 --filter='{ cat Header.sql; cat; } | sqlcmd; printf %s\\n DONE' infile

또는 다음과 같은 스크립트를 작성하세요 myscript.

#!/bin/sh

{ cat Header.sql; cat; } | sqlcmd
printf %s\\n '--- PROCESSED ---'

그런 다음 간단히 실행하십시오.

split -l 100 --filter=./myscript infile

Question 3

_linc() ( ${sh-da}sh ${dbg+-vx} 4<&0 <&3 ) 3<<-ARGS 3<<\CMD
        set -- $( [ $((i=${1%%*[!0-9]*}-1)) -gt 1 ] && {
                shift && echo "\${inc=$i}" ; }
        unset cmd ; [ $# -gt 0 ] || cmd='echo incr "#$((i=i+1))" ; cat'
        printf '%s ' 'me=$$ ;' \
        '_cmd() {' '${dbg+set -vx ;}' "$@" "$cmd" '
        }' )
        ARGS
        s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
                i_cmd <<"${s:=${me}SPLIT${me}}"
                ${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
                a$s
        INC
CMD

위의 함수는 sed인수 목록을 명령 문자열로 임의의 줄 증분에 적용하는 데 사용됩니다. 명령줄에 지정한 명령은 임시 쉘 함수에 입력되며, 이는 각 증분에 대한 단계 행이 포함된 여기 stdin의 문서에 입력됩니다.

다음과 같이 사용합니다.

time printf 'this is line #%d\n' `seq 1000` |
_linc 193 sed -e \$= -e r \- \| tail -n2
    #output
193
this is line #193
193
this is line #386
193
this is line #579
193
this is line #772
193
this is line #965
35
this is line #1000
printf 'this is line #%d\n' `seq 1000`  0.00s user 0.00s system 0% cpu 0.004 total

여기서의 메커니즘은 매우 간단합니다.

i_cmd <<"${s:=${me}SPLIT${me}}"
${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
a$s

이것이 sed스크립트입니다. 기본적으로 우리는 단지 printf $increment * n;. 따라서 증분을 100으로 설정하면 100줄로 구성된 스크립트를 printf작성하게 됩니다 sed. 즉 $!n, 한 줄은 insert여기 문서의 상단에, 다른 한 줄은 append문서 하단에 사용됩니다. 그게 전부입니다. 나머지 대부분은 단지 처리 옵션일 뿐입니다.

ext 명령은 현재 줄을 인쇄하고 삭제한 후 다음 줄을 가져오라고 n지시합니다 . 마지막 줄을 제외한 모든 줄만 시도하도록 sed지정합니다 .$!

하나의 증분기만 제공되며 이는 다음을 수행합니다.

printf 'this is line #%d\n' `seq 10` |                                  ⏎
_linc 3
    #output
incr #1
this is line #1
this is line #2
this is line #3
incr #2
this is line #4
this is line #5
this is line #6
incr #3
this is line #7
this is line #8
this is line #9
incr #4
this is line #10

따라서 뒤에서 일어나는 일은 함수가 echo카운터와 cat입력(명령 문자열이 제공되지 않은 경우)으로 설정된다는 것입니다. 명령줄에서 보면 다음과 같습니다.

{ echo "incr #$((i=i+1))" ; cat ; } <<HEREDOC
this is line #7
this is line #8
this is line #9
HEREDOC

각 증분마다 이들 중 하나를 실행합니다. 바라보다:

printf 'this is line #%d\n' `seq 10` |
dbg= _linc 3
    #output
set -- ${inc=2}
+ set -- 2
me=$$ ; _cmd() { ${dbg+set -vx ;} echo incr "#$((i=i+1))" ; cat
}
+ me=19396
        s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
                i_cmd <<"${s:=${me}SPLIT${me}}"
                ${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
                a$s
        INC
+ s=
+ . /dev/stdin
+ seq 2
+ printf $!n\n%.0b 1 2
+ sed -f - /dev/fd/4
_cmd <<"19396SPLIT19396"
this is line #1
this is line #2
this is line #3
19396SPLIT19396
+ _cmd
+ set -vx ; echo incr #1
+ cat
this is line #1
this is line #2
this is line #3
_cmd <<"19396SPLIT19396"

매우 빠르다

time yes | sed = | sed -n 'p;n' |
_linc 4000 'printf "current line and char count\n"
    sed "1w /dev/fd/2" | wc -c
    [ $((i=i+1)) -ge 5000 ] && kill "$me" || echo "$i"'

    #OUTPUT

current line and char count
19992001
36000
4999
current line and char count
19996001
36000
current line and char count
[2]    17113 terminated  yes |
       17114 terminated  sed = |
       17115 terminated  sed -n 'p;n'
yes  0.86s user 0.06s system 5% cpu 16.994 total
sed =  9.06s user 0.30s system 55% cpu 16.993 total
sed -n 'p;n'  7.68s user 0.38s system 47% cpu 16.992 total

위에서는 4000행마다 증가하도록 지시했습니다. 17초 후에 2천만 행을 처리했습니다. 물론, 논리는 엄격하지 않습니다. 각 줄을 두 번 읽고 모든 문자를 세는 것뿐입니다. 그러나 가능성은 상당히 열려 있습니다. 또한 자세히 살펴보면 대부분의 시간을 차지하는 것으로 보이는 입력을 제공하는 필터라는 것을 알 수 있습니다.

Answer

_linc() ( ${sh-da}sh ${dbg+-vx} 4<&0 <&3 ) 3<<-ARGS 3<<\CMD
        set -- $( [ $((i=${1%%*[!0-9]*}-1)) -gt 1 ] && {
                shift && echo "\${inc=$i}" ; }
        unset cmd ; [ $# -gt 0 ] || cmd='echo incr "#$((i=i+1))" ; cat'
        printf '%s ' 'me=$$ ;' \
        '_cmd() {' '${dbg+set -vx ;}' "$@" "$cmd" '
        }' )
        ARGS
        s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
                i_cmd <<"${s:=${me}SPLIT${me}}"
                ${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
                a$s
        INC
CMD

위의 함수는 sed인수 목록을 명령 문자열로 임의의 줄 증분에 적용하는 데 사용됩니다. 명령줄에 지정한 명령은 임시 쉘 함수에 입력되며, 이는 각 증분에 대한 단계 행이 포함된 여기 stdin의 문서에 입력됩니다.

다음과 같이 사용합니다.

time printf 'this is line #%d\n' `seq 1000` |
_linc 193 sed -e \$= -e r \- \| tail -n2
    #output
193
this is line #193
193
this is line #386
193
this is line #579
193
this is line #772
193
this is line #965
35
this is line #1000
printf 'this is line #%d\n' `seq 1000`  0.00s user 0.00s system 0% cpu 0.004 total

여기서의 메커니즘은 매우 간단합니다.

i_cmd <<"${s:=${me}SPLIT${me}}"
${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
a$s

이것이 sed스크립트입니다. 기본적으로 우리는 단지 printf $increment * n;. 따라서 증분을 100으로 설정하면 100줄로 구성된 스크립트를 printf작성하게 됩니다 sed. 즉 $!n, 한 줄은 insert여기 문서의 상단에, 다른 한 줄은 append문서 하단에 사용됩니다. 그게 전부입니다. 나머지 대부분은 단지 처리 옵션일 뿐입니다.

ext 명령은 현재 줄을 인쇄하고 삭제한 후 다음 줄을 가져오라고 n지시합니다 . 마지막 줄을 제외한 모든 줄만 시도하도록 sed지정합니다 .$!

하나의 증분기만 제공되며 이는 다음을 수행합니다.

printf 'this is line #%d\n' `seq 10` |                                  ⏎
_linc 3
    #output
incr #1
this is line #1
this is line #2
this is line #3
incr #2
this is line #4
this is line #5
this is line #6
incr #3
this is line #7
this is line #8
this is line #9
incr #4
this is line #10

따라서 뒤에서 일어나는 일은 함수가 echo카운터와 cat입력(명령 문자열이 제공되지 않은 경우)으로 설정된다는 것입니다. 명령줄에서 보면 다음과 같습니다.

{ echo "incr #$((i=i+1))" ; cat ; } <<HEREDOC
this is line #7
this is line #8
this is line #9
HEREDOC

각 증분마다 이들 중 하나를 실행합니다. 바라보다:

printf 'this is line #%d\n' `seq 10` |
dbg= _linc 3
    #output
set -- ${inc=2}
+ set -- 2
me=$$ ; _cmd() { ${dbg+set -vx ;} echo incr "#$((i=i+1))" ; cat
}
+ me=19396
        s= ; sed -f - <<-INC /dev/fd/4 | . /dev/stdin
                i_cmd <<"${s:=${me}SPLIT${me}}"
                ${inc:+$(printf '$!n\n%.0b' `seq $inc`)}
                a$s
        INC
+ s=
+ . /dev/stdin
+ seq 2
+ printf $!n\n%.0b 1 2
+ sed -f - /dev/fd/4
_cmd <<"19396SPLIT19396"
this is line #1
this is line #2
this is line #3
19396SPLIT19396
+ _cmd
+ set -vx ; echo incr #1
+ cat
this is line #1
this is line #2
this is line #3
_cmd <<"19396SPLIT19396"

매우 빠르다

time yes | sed = | sed -n 'p;n' |
_linc 4000 'printf "current line and char count\n"
    sed "1w /dev/fd/2" | wc -c
    [ $((i=i+1)) -ge 5000 ] && kill "$me" || echo "$i"'

    #OUTPUT

current line and char count
19992001
36000
4999
current line and char count
19996001
36000
current line and char count
[2]    17113 terminated  yes |
       17114 terminated  sed = |
       17115 terminated  sed -n 'p;n'
yes  0.86s user 0.06s system 5% cpu 16.994 total
sed =  9.06s user 0.30s system 55% cpu 16.993 total
sed -n 'p;n'  7.68s user 0.38s system 47% cpu 16.992 total

위에서는 4000행마다 증가하도록 지시했습니다. 17초 후에 2천만 행을 처리했습니다. 물론, 논리는 엄격하지 않습니다. 각 줄을 두 번 읽고 모든 문자를 세는 것뿐입니다. 그러나 가능성은 상당히 열려 있습니다. 또한 자세히 살펴보면 대부분의 시간을 차지하는 것으로 보이는 입력을 제공하는 필터라는 것을 알 수 있습니다.

Question 4

나는 꽤 역겨워 보이는 결과를 얻었습니다. 더 좋은 방법이 있으면 게시해 주세요:

#!/bin/sh

DONE=false
until $DONE; do
    for i in $(seq 1 $2); do 
        read line || DONE=true;
        [ -z "$line" ] && continue;
        lines+=$line$'\n';
    done
    sql=${lines::${#lines}-10}
    (cat "Header.sql"; echo "$sql";) | sqlcmd
    #echo "--- PROCESSED ---";
    lines=;
done < $1

./insert.sh "File.sql" 100100은 한 번에 처리할 행 수로 실행합니다 .

Answer

나는 꽤 역겨워 보이는 결과를 얻었습니다. 더 좋은 방법이 있으면 게시해 주세요:

#!/bin/sh

DONE=false
until $DONE; do
    for i in $(seq 1 $2); do 
        read line || DONE=true;
        [ -z "$line" ] && continue;
        lines+=$line$'\n';
    done
    sql=${lines::${#lines}-10}
    (cat "Header.sql"; echo "$sql";) | sqlcmd
    #echo "--- PROCESSED ---";
    lines=;
done < $1

./insert.sh "File.sql" 100100은 한 번에 처리할 행 수로 실행합니다 .

명령으로 파이프하기 위해 파일을 분할하는 방법(예: stdout으로 분할)은 무엇입니까?

답변1

답변2

답변3

매우 빠르다

답변4

관련 정보