원하는 하위 문자열/문자열을 얻는 더 빠른 awk 스크립트

원하는 하위 문자열/문자열을 얻는 더 빠른 awk 스크립트
ORDER EVENT .........[] [] ... so many other tags... [Account<25106>=ACCT1] [Destination...] .. so many other tags.

나는 현재 그러한 계정을 얻으려고 노력하고 있습니다. awk에서 match를 사용해 보았지만 속도가 느린 것 같습니다. 아래의 더 빠른 방법 외에 다른 것을 제안해 주실 수 있나요?

j = index($0, "<25106>=");
account=substr($0, j + accountTagLength);
account=substr(account,1,index(account, "]") - 1);

계정은 두 번째 필드가 아니며 필드 위치는 다를 수 있습니다.

시간:

bash-3.2$ time head -1000000 temp.log | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//' > /dev/null

real    0m2.410s
user    0m2.782s
sys     0m0.319s
bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m1.690s
user    0m1.737s
sys     0m0.448s
bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m1.588s
user    0m1.733s
sys     0m0.179s
bash-3.2$ time head -1000000 temp.log | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//' > /dev/null                               
real    0m2.384s
user    0m2.762s
sys     0m0.272s
bash-3.2$ time head -1000000 temp.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m1.703s
user    0m1.709s
sys     0m0.484s

bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}'

real    0m3.449s
user    0m3.661s
sys     0m0.290s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}'

real    0m3.410s
user    0m3.551s
sys     0m0.236s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}'

real    0m3.361s
user    0m3.487s
sys     0m0.286s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m1.626s
user    0m1.831s
sys     0m0.263s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}'

real    0m2.721s
user    0m2.808s
sys     0m0.265s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}'

real    0m2.787s
user    0m2.863s
sys     0m0.516s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}'

real    0m2.724s
user    0m2.882s
sys     0m0.278s
bash-3.2$ time head -1000000 dumper/cam_verbose.20120220.000.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m1.576s
user    0m1.748s
sys     0m0.235s

bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' | cut -d= -f2 > /dev/null                                     
real    0m2.098s
user    0m2.131s
sys     0m0.033s
bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); print substr(account,1,index(account, "]") - 1);} }' > /dev/null

real    0m0.253s
user    0m0.275s
sys     0m0.040s
bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' | cut -d= -f2 > /dev/null                                     
real    0m2.070s
user    0m2.105s
sys     0m0.034s
bash-3.2$ time head -100000 ORDER_EVENTS_CHAS_20120224.log | grep -oE '<25106>=([A-Za-z0-9]*)+' > /dev/null

real    0m2.065s
user    0m2.090s
sys     0m0.037s
    bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk -F'<25106>=' '{ substr($2,0,index($2,"]")-1);}'

real    0m3.426s
user    0m3.637s
sys     0m0.412s
bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk -F'<25106>=' '{ substr($2,0,index($2,"]")-1);}'

real    0m3.463s
user    0m3.603s
sys     0m0.408s
bash-3.2$ time head -1000000 ORDER_EVENTS_CHAS_20120228.log | awk '{j = index($0, "25106>="); if (j > 0) { account=substr($0, j + 7); substr(account,1,index(account, "]") - 1);} }'

real    0m2.247s
user    0m2.307s
sys     0m0.649s

답변1

이 시도:

awk -F'<25106>=' '{print substr($2,0,index($2,"]")-1);}'

정규식을 사용하지 않고 엄격한 문자열 조작만 사용합니다.

답변2

GNU awk( )가 있는 경우 캡처 대괄호와 함께 함수를 gawk사용할 수 있습니다 .match()

gawk 'match($0, /<25106>=([^]]+)/, ary) {account = ary[1]}'

또는 복잡한 필드 구분 기호를 사용할 수 있습니다.

awk -F '<25106>=' '{split($2, ary, /\]/); account = ary[1]}'

답변3

이 번호만 인쇄하는 경우 다음을 시도해 볼 수 있습니다.

echo "ORDER EVENT ......... [Account<25106>=ACCT1]" | awk -F'<25106>=' '{print $2}' | sed -e 's/].*//'

편집: sed 전용 솔루션:

echo "ORDER EVENT ......... [Account<25106>=ACCT1]" | sed -e 's/.*25106>=//' -e 's/].*//'

편집 2:

awk '{if (split($0, a, "25106>=") > 1) {print substr(a[2], 0, index(a[2], "]")-1)} }'

관련 정보