awk cli에서 들여쓰기되지 않은 줄을 레코드 구분 기호로 사용하는 방법

Question 1

노력하다:

$ gawk 'BEGIN{RS="\n2016"}; /user1/ {print}' input

그러면 출력이 생성됩니다.

2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

두 번째 레코드에는 원본이 누락되어 있습니다 2016. 그건. 물론 그것이 2016레코드 구분 기호의 일부가 되기 때문입니다. 기록 처리를 시작하기 전에 이 섹션을 복원하려면 다음을 수행하십시오.

gawk 'BEGIN{RS="\n2016"} NR>1{$0="2016" $0;} /user1/ {print}' input

개선하다

이 버전은 필요에 따라 각 줄의 시작 부분에 텍스트를 복원합니다.

gawk '{$0=substr(last,2)$0;} /user1/{print} {last=RT}' RS='\n[^[:space:]]' input

작동 방식:

{$0=substr(last,2)$0;}$0레코드 구분 기호로 제거된 텍스트 앞에 추가됩니다 . substr앞의 개행 문자를 제거하는 데 사용됩니다.
/user1/{print}우리가 관심 있는 기록을 인쇄하세요.
{last=RT}실제 레코드 구분 기호를 저장하여 그 일부가 다음 레코드 앞에 추가되도록 합니다. RT이는 GNU 확장이며 다른 버전의 awk에서는 지원되지 않습니다.
RS='\n[^[:space:]]'레코드 구분 기호를 개행 문자로 설정하고 그 뒤에 공백이 아닌 문자가 옵니다. 정규식을 레코드 구분자로 사용하는 것은 GNU awk에서 작동합니다.

예:

$ gawk '{$0=substr(last,2)$0;} /user1/{print} {last=RT}' RS='\n[^[:space:]]' input
2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

Answer

노력하다:

$ gawk 'BEGIN{RS="\n2016"}; /user1/ {print}' input

그러면 출력이 생성됩니다.

2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

두 번째 레코드에는 원본이 누락되어 있습니다 2016. 그건. 물론 그것이 2016레코드 구분 기호의 일부가 되기 때문입니다. 기록 처리를 시작하기 전에 이 섹션을 복원하려면 다음을 수행하십시오.

gawk 'BEGIN{RS="\n2016"} NR>1{$0="2016" $0;} /user1/ {print}' input

개선하다

이 버전은 필요에 따라 각 줄의 시작 부분에 텍스트를 복원합니다.

gawk '{$0=substr(last,2)$0;} /user1/{print} {last=RT}' RS='\n[^[:space:]]' input

작동 방식:

{$0=substr(last,2)$0;}$0레코드 구분 기호로 제거된 텍스트 앞에 추가됩니다 . substr앞의 개행 문자를 제거하는 데 사용됩니다.
/user1/{print}우리가 관심 있는 기록을 인쇄하세요.
{last=RT}실제 레코드 구분 기호를 저장하여 그 일부가 다음 레코드 앞에 추가되도록 합니다. RT이는 GNU 확장이며 다른 버전의 awk에서는 지원되지 않습니다.
RS='\n[^[:space:]]'레코드 구분 기호를 개행 문자로 설정하고 그 뒤에 공백이 아닌 문자가 옵니다. 정규식을 레코드 구분자로 사용하는 것은 GNU awk에서 작동합니다.

예:

$ gawk '{$0=substr(last,2)$0;} /user1/{print} {last=RT}' RS='\n[^[:space:]]' input
2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

Question 2

이것은 약간 다른 전략입니다. 들여쓰기된 각 줄을 저장 버퍼에 축적합니다. 들여쓰기되지 않은 줄을 읽으면 버퍼를 인쇄하고(필요한 패턴이 포함된 경우) 버퍼 내용을 새 헤더 줄로 바꾸는 함수를 호출합니다. 또한 파일 끝에 도달했을 때 이 함수를 호출해야 합니다.

#!/usr/bin/awk -f
#   Select records from a file 
#   Each record header line is unindented and each record body line is indented
#   Written by PM 2Ring 2015.06.02

function ShowSelected()
{
    if (hold ~ /User: user1/)
        printf "%s", hold
    hold = $0 ORS
}

/^ /{hold = hold $0 ORS; next}

{ShowSelected()}

END{ShowSelected()}

다음은 한 줄 버전입니다.

awk 'function S(){if(h~/User: user1/)printf "%s",h; h=$0 ORS}; /^ /{h=h $0 ORS; next}; {S()};END{S()}'

재미로 여기에 sed 버전이 있습니다. 본질적으로 동일한 알고리즘을 사용합니다.

sed '/^ /!bA;H;$bA;d;:A;x;/User: user1/!d'

댓글도 마찬가지입니다.

#!/bin/sed -f    
#   Select records from a file 
#   Each record header line is unindented and each record body line is indented
#   Written by PM 2Ring 2015.06.02

# If line doesn't start with a space, branch to the select & display routine
/^ /!bA

# Append pattern space (i.e., the current line) to the hold space
H

# If this is the last line, branch to the select & display routine
$bA

# Delete the pattern space and start the next cycle
d

# The select & display routine
:A

# Exchange the contents of the hold and pattern spaces
x

# Delete the pattern if it doesn't contain the regex /User: user1/
# if the pattern isn't deleted it will be printed
/User: user1/!d

이것은 sed를 사용하여 일부 전처리를 수행하려는 Thor의 아이디어에서 영감을 받은 sed - awk 하이브리드입니다. 들여 쓰기되지 않은 각 줄 앞에 \xff문자를 붙인 다음 이를 awk 레코드 구분 기호로 사용합니다. 로그 파일 \xff자체가 해당 문자를 사용하는 경우에는 작동하지 않지만, 그렇지 않기를 바랍니다. :)

<logfile sed 's/^[^ ]/\xff&/' | awk 'BEGIN{RS="\xff";ORS=""};/User: user1/'

Answer

이것은 약간 다른 전략입니다. 들여쓰기된 각 줄을 저장 버퍼에 축적합니다. 들여쓰기되지 않은 줄을 읽으면 버퍼를 인쇄하고(필요한 패턴이 포함된 경우) 버퍼 내용을 새 헤더 줄로 바꾸는 함수를 호출합니다. 또한 파일 끝에 도달했을 때 이 함수를 호출해야 합니다.

#!/usr/bin/awk -f
#   Select records from a file 
#   Each record header line is unindented and each record body line is indented
#   Written by PM 2Ring 2015.06.02

function ShowSelected()
{
    if (hold ~ /User: user1/)
        printf "%s", hold
    hold = $0 ORS
}

/^ /{hold = hold $0 ORS; next}

{ShowSelected()}

END{ShowSelected()}

다음은 한 줄 버전입니다.

awk 'function S(){if(h~/User: user1/)printf "%s",h; h=$0 ORS}; /^ /{h=h $0 ORS; next}; {S()};END{S()}'

재미로 여기에 sed 버전이 있습니다. 본질적으로 동일한 알고리즘을 사용합니다.

sed '/^ /!bA;H;$bA;d;:A;x;/User: user1/!d'

댓글도 마찬가지입니다.

#!/bin/sed -f    
#   Select records from a file 
#   Each record header line is unindented and each record body line is indented
#   Written by PM 2Ring 2015.06.02

# If line doesn't start with a space, branch to the select & display routine
/^ /!bA

# Append pattern space (i.e., the current line) to the hold space
H

# If this is the last line, branch to the select & display routine
$bA

# Delete the pattern space and start the next cycle
d

# The select & display routine
:A

# Exchange the contents of the hold and pattern spaces
x

# Delete the pattern if it doesn't contain the regex /User: user1/
# if the pattern isn't deleted it will be printed
/User: user1/!d

이것은 sed를 사용하여 일부 전처리를 수행하려는 Thor의 아이디어에서 영감을 받은 sed - awk 하이브리드입니다. 들여 쓰기되지 않은 각 줄 앞에 \xff문자를 붙인 다음 이를 awk 레코드 구분 기호로 사용합니다. 로그 파일 \xff자체가 해당 문자를 사용하는 경우에는 작동하지 않지만, 그렇지 않기를 바랍니다. :)

<logfile sed 's/^[^ ]/\xff&/' | awk 'BEGIN{RS="\xff";ORS=""};/User: user1/'

Question 3

sed예를 들어 파일을 전처리합니다 . 따라서 각 레코드의 두 번째 행을 추출하려면 다음을 수행하십시오.

<infile sed 's/^[^ ]/&\n/' | awk '{ print $2 }' RS= FS='\n'

산출:

  From: ip68-8-49-100.sd.sd.cox.net
  From: ip68-8-49-100.sd.sd.cox.net
  From: ubunzeus
  From: ubunzeus

편집 - 다음을 포함하는 `$3`모든 기록을 어떻게 인쇄할 수 있습니까 `user1`?

<infile sed '1!s/^[^ ]/\n&/' | awk '$3 ~ /user1/' RS= FS='\n'

산출:

2016-05-31 09:54:36 (16667) heritage_w?                                
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

Answer

sed예를 들어 파일을 전처리합니다 . 따라서 각 레코드의 두 번째 행을 추출하려면 다음을 수행하십시오.

<infile sed 's/^[^ ]/&\n/' | awk '{ print $2 }' RS= FS='\n'

산출:

  From: ip68-8-49-100.sd.sd.cox.net
  From: ip68-8-49-100.sd.sd.cox.net
  From: ubunzeus
  From: ubunzeus

편집 - 다음을 포함하는 `$3`모든 기록을 어떻게 인쇄할 수 있습니까 `user1`?

<infile sed '1!s/^[^ ]/\n&/' | awk '$3 ~ /user1/' RS= FS='\n'

산출:

2016-05-31 09:54:36 (16667) heritage_w?                                
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

Question 4

IMO에서 가장 쉬운 방법은 sed입력을 단락으로 구분된 레코드(각 레코드 사이에 하나 이상의 빈 줄)로 변환하는 것입니다. 즉, 첫 번째 줄을 건너뛰고 공백(공백 또는 탭)으로 시작하지 않는 각 줄 앞에 개행 문자를 삽입합니다.

awk그런 다음 두 개 이상의 줄 바꿈을 입력 레코드 구분 기호(RS)로 사용하도록 지시할 수 있습니다 RS='\n\n+'.

그런데 출력을 단락에도 포함시키려는 경우가 아니면 출력 레코드 구분 기호(ORS)를 동일하게 설정할 필요가 없습니다. 이것을 요청하지 않으셨기 때문에 포함하지 않았습니다. 이것이 원하는 경우(예: 출력에 대해 추가 처리를 수행하려는 경우) 옵션 -v ORS='\n\n'에 추가하십시오.awk

$ sed -e '2,$ s/^[^[:blank:]]/\n&/' ldjames.txt | 
    awk -v RS='\n\n+' '/user1/ {print}'
2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

Answer

IMO에서 가장 쉬운 방법은 sed입력을 단락으로 구분된 레코드(각 레코드 사이에 하나 이상의 빈 줄)로 변환하는 것입니다. 즉, 첫 번째 줄을 건너뛰고 공백(공백 또는 탭)으로 시작하지 않는 각 줄 앞에 개행 문자를 삽입합니다.

awk그런 다음 두 개 이상의 줄 바꿈을 입력 레코드 구분 기호(RS)로 사용하도록 지시할 수 있습니다 RS='\n\n+'.

그런데 출력을 단락에도 포함시키려는 경우가 아니면 출력 레코드 구분 기호(ORS)를 동일하게 설정할 필요가 없습니다. 이것을 요청하지 않으셨기 때문에 포함하지 않았습니다. 이것이 원하는 경우(예: 출력에 대해 추가 처리를 수행하려는 경우) 옵션 -v ORS='\n\n'에 추가하십시오.awk

$ sed -e '2,$ s/^[^[:blank:]]/\n&/' ldjames.txt | 
    awk -v RS='\n\n+' '/user1/ {print}'
2016-05-31 09:54:36 (16667) heritage_w?
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?i=290
  #accesses 3,435 (#welcome 415) since 03/07/2012
2016-05-31 09:54:41 (16677) heritage_w?w=
  From: ip68-8-49-100.sd.sd.cox.net
  User: user1wizard (wizard)
  Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36
  Referer: http://dbase.apollo3.com/heritage_w?
  #accesses 3,436 (#welcome 416) since 03/07/2012

awk cli에서 들여쓰기되지 않은 줄을 레코드 구분 기호로 사용하는 방법

이 문제의 구체적인 내용을 명확히 하세요.

답변1

개선하다

답변2

답변3

편집 - 다음을 포함하는 `$3`모든 기록을 어떻게 인쇄할 수 있습니까 `user1`?

답변4

관련 정보

이 문제의 구체적인 내용을 명확히 하세요.

답변1

개선하다

답변2

답변3

편집 - 다음을 포함하는 $3모든 기록을 어떻게 인쇄할 수 있습니까 user1?

답변4

관련 정보

편집 - 다음을 포함하는 `$3`모든 기록을 어떻게 인쇄할 수 있습니까 `user1`?