2GB 파일이 있습니다. 여기에는 헤더와 수많은 "이벤트" 구조가 있습니다. 처음에는 이렇게 보입니다.
<run example>
<header>
5
This is header
</header>
<event = 22>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
4 1 4 3 1 0 1 0
0 0 0 0 0 1 1 8
0 1 0 2 1 5 2 0
2 1 3 7 3 1 1 0
1 0 10100 2 3 1 5 1
1 5 1 7 2 3 2 2
</evhead>
0 97
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 11 1 0 0 0 4.63012694434E+00 2.62561831936E+00 -2.31855757639E+01 2.37887130977E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 22 1 0 0 0 -4.63012694434E+00 -2.62561831936E+00 -4.31442423592E+00 3.71128690719E+00 -5.75956188088E+00
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 2 0 0 0 -2.16995636615E-14 -1.11022302463E-15 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 22 4 0 0 0 -4.60626572550E+00 -2.61208727495E+00 -2.23619853289E+00 5.74815342040E+00 0.00000000000E+00
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
</event>
전체 파일에는 이러한 "이벤트" 블록이 97,000개 포함되어 있습니다. 그래서 저는 이 파일을 10개의 파일로 나누고 싶습니다. 각 파일에는 헤더와 10,000개의 "이벤트" 청크가 포함되어 있습니다. 모든 블록의 인덱스는 서로 다릅니다(임의임). 물론 마지막 파일에는 7000개의 블록만 포함되어 있습니다.
나는 다음과 같이 스택에서 여러 명령을 시도했습니다.https://stackoverflow.com/questions/8544197/splitting-a-file-in-linux-based-on-content https://stackoverflow.com/questions/8544197/splitting-a-file-in-linux-based-on-content 그러나 나에게는 아무것도 효과가 없었습니다.
다음은 모든 테스트에 사용되는 파일의 더 큰 예입니다(다운로드할 파일):
<run example>
<header>
5
header
</header>
<event = 22>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
4 1 4 3 1 0 1 0
</evhead>
0 97
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
</event>
<event = 26>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
4 1 4 3 1 0 1 0
</evhead>
0 52
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
</event>
<event = 31>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
4 1 4 3 1 0 1 0
0 0 0 0 0 1 1 8
</evhead>
0 92
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 11 1 0 0 0 4.39003604933E+00 4.97037860337E+00 -2.04926313413E+01 2.15389187176E+01 5.10000000000E-04
</event>
<event = 37>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
</evhead>
0 77
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 11 1 0 0 0 7.91768942174E+00 3.75815788575E+00 -2.09569980000E+01 2.27158385693E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
</event>
<event = 41>
<evhead>
8
3 1 2 0 0 0 0 0
0 0 1 0 1 30 0 1
4 1 4 3 1 0 1 0
</evhead>
0 122
3 11 0 0 3 4 1.94791176123E-14 0.00000000000E+00 -2.75000000000E+01 2.75000000047E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 2212 0 0 5 0 -1.94791176123E-14 0.00000000000E+00 9.20000000000E+02 9.20000478451E+02 9.38270000000E-01
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
3 11 1 0 0 0 -3.63469912393E+00 3.95372353695E+00 -1.62133507727E+01 1.70796870892E+01 5.10000000000E-04
0.00000000000E+00 0.00000000000E+00 0.00000000000E+00 0.00000000000E+00
</event>
답변1
GNU 사용 awk
:
BEGIN { fname = "/dev/null" }
/<header>/,/<\/header>/ { hdr = hdr $0 "\n"; next }
/^<event / {
events++
if(events % 10000 == 1) {
if(files++) close(fname)
fname = sprintf("file%02d.txt", files)
print hdr >fname
}
}
{ print >>fname }
실행하려면: 파일에 쓴 script.awk
후 다음을 실행하세요.
gawk -f script.awk file.txt