리눅스를 사용하여 단락을 분리하세요

리눅스를 사용하여 단락을 분리하세요

연결된 단락이 포함된 텍스트 파일이 있습니다. 각 단락을 빈 줄로 구분해야 합니다. 각 문단은 >FP0패턴으로 시작해야 하는데 문단이 서로 연결되어 있기 때문에 현재 파일의 줄 시작 부분에서는 패턴을 찾을 수 없습니다. sed명령 을 시도했지만 >FP0패턴이 포함된 줄을 기준으로 구분했지만 새 단락의 시작 부분에 나타나지 않습니다.

단락 예

>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

사용된 sed 코드는

sed '/>/s/^/\n/'

출력은 다음과 같습니다

>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

TTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

A>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

(새 단락 시작 시 >FP0 앞에는 문자가 필요하지 않습니다.)

답변1

대신 Perl을 사용할 수 있습니다.

$ perl -pe 's/>/\n\n>/g' file


>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

그러나 파일의 첫 번째 문자가 이면 그 앞에 다른 문자가 있는 경우 >에만 바꾸도록 제한할 수 있습니다 .>

$ perl -pe 's/(.)>/$1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

또는 GNU를 사용하십시오 sed.

$ sed -E 's/(.)>/\1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

그리고 다음과 같이 sed:

sed 's/\(.\)>/\1\
\
>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT

>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG

답변2

스크립트 sed는 in it 을 포함하는 모든 줄을 찾지 만 줄 시작 부분에 개행 문자를 추가합니다( 정규식에서 >이것이 의미하는 바입니다).^

아마도 이것을 시도해 볼 수 있습니다:

sed 's/>/\n&/g' file

그러나 \n리터럴 줄 바꿈이 생성되는지 여부는 sed버전에 따라 다릅니다. 원하는 동작은 많은 Linux 플랫폼에서 일반적이지만 동일하지는 않습니다. (어떤 배포판 및/또는 버전인지 명확히 하거나 sedAwk 또는 Perl과 같은 보다 이식 가능한 솔루션을 사용해 보십시오.)

awk -F '>' 'BEGIN { OFS="\n>" } { $1=$1 } 1' file

해킹은 { $1 = $1 }awk가 라인을 분할하도록 강제합니다. 라인에 아무것도 변경되지 않으면 단순히 입력을 출력에 복사하여 처리를 최적화하지만 이로 인해 뭔가 변경되었다고 생각하게 됩니다.

여러 줄 바꿈이 필요한 경우 각 새 줄 앞에 빈 줄을 얻으 \n려면 분명히 두 개 이상의 변경 사항을 입력하십시오 .\n\n

답변3

GNU sed

$ sed 's/>/\n\n&/2g' input_file

POSIXly sed

sed -e '
  y/>/\n/
  s/\n/>/
  s//&&>/g
' input_file

$ perl -pe 's/(?<!^)(?=>)/\n\n/g' input_file
awk -v RS=">" -v ORS= '
NR>1&&sub(/^/,(!k++ ? ORS : "\n\n") RS)
' input_file

관련 정보