![리눅스를 사용하여 단락을 분리하세요](https://linux55.com/image/204970/%EB%A6%AC%EB%88%85%EC%8A%A4%EB%A5%BC%20%EC%82%AC%EC%9A%A9%ED%95%98%EC%97%AC%20%EB%8B%A8%EB%9D%BD%EC%9D%84%20%EB%B6%84%EB%A6%AC%ED%95%98%EC%84%B8%EC%9A%94.png)
연결된 단락이 포함된 텍스트 파일이 있습니다. 각 단락을 빈 줄로 구분해야 합니다. 각 문단은 >FP0
패턴으로 시작해야 하는데 문단이 서로 연결되어 있기 때문에 현재 파일의 줄 시작 부분에서는 패턴을 찾을 수 없습니다. sed
명령 을 시도했지만 >FP0
패턴이 포함된 줄을 기준으로 구분했지만 새 단락의 시작 부분에 나타나지 않습니다.
단락 예
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
사용된 sed 코드는
sed '/>/s/^/\n/'
출력은 다음과 같습니다
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
TTT>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
A>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
(새 단락 시작 시 >FP0 앞에는 문자가 필요하지 않습니다.)
답변1
대신 Perl을 사용할 수 있습니다.
$ perl -pe 's/>/\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
그러나 파일의 첫 번째 문자가 이면 그 앞에 다른 문자가 있는 경우 >
에만 바꾸도록 제한할 수 있습니다 .>
$ perl -pe 's/(.)>/$1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
또는 GNU를 사용하십시오 sed
.
$ sed -E 's/(.)>/\1\n\n>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
그리고 다음과 같이 sed
:
sed 's/\(.\)>/\1\
\
>/g' file
>FP004340TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
>FP00598AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>FP005521GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
답변2
스크립트 sed
는 in it 을 포함하는 모든 줄을 찾지 만 줄 시작 부분에 개행 문자를 추가합니다( 정규식에서 >
이것이 의미하는 바입니다).^
아마도 이것을 시도해 볼 수 있습니다:
sed 's/>/\n&/g' file
그러나 \n
리터럴 줄 바꿈이 생성되는지 여부는 sed
버전에 따라 다릅니다. 원하는 동작은 많은 Linux 플랫폼에서 일반적이지만 동일하지는 않습니다. (어떤 배포판 및/또는 버전인지 명확히 하거나 sed
Awk 또는 Perl과 같은 보다 이식 가능한 솔루션을 사용해 보십시오.)
awk -F '>' 'BEGIN { OFS="\n>" } { $1=$1 } 1' file
해킹은 { $1 = $1 }
awk가 라인을 분할하도록 강제합니다. 라인에 아무것도 변경되지 않으면 단순히 입력을 출력에 복사하여 처리를 최적화하지만 이로 인해 뭔가 변경되었다고 생각하게 됩니다.
여러 줄 바꿈이 필요한 경우 각 새 줄 앞에 빈 줄을 얻으 \n
려면 분명히 두 개 이상의 변경 사항을 입력하십시오 .\n\n
답변3
GNU sed
$ sed 's/>/\n\n&/2g' input_file
POSIXly sed
sed -e '
y/>/\n/
s/\n/>/
s//&&>/g
' input_file
$ perl -pe 's/(?<!^)(?=>)/\n\n/g' input_file
awk -v RS=">" -v ORS= '
NR>1&&sub(/^/,(!k++ ? ORS : "\n\n") RS)
' input_file