특정 문구가 파일에 나타나는 횟수를 계산하고 깔끔하게 서식을 지정하려면 어떻게 해야 합니까?

Question 1

업데이트된 질문의 경우:

awk '
/"rarely_used_module"/ && /OUT:/ { nc[$NF]++ ; c++ }
END {
    printf "Number of license checkouts for rarely_used_module: %d\n", c
    for (i in nc) printf "User: %s (%d)\n", i, nc[i]
}
' logfile.txt

다음 출력을 생성합니다.

Number of license checkouts for rarely_used_module: 4
User: [email protected] (2)
User: [email protected] (2)

요구 사항이 증가할 경우 코드를 확장하는 방법을 보여주기 위해 아래에 원래 답변을 남겨 두었습니다.

다음은 다음을 사용하여 이러한 작업을 수행하는 방법의 예입니다 awk.

awk '
BEGIN { SUBSEP = ", " ; OFS = ": " }
{ m[$(NF-1)]++ }
{ n[$(NF-1)] = n[$(NF-1)] " " $NF }
{ nc[$(NF-1),$NF]++ }
END {
    print "\n=== count modules:"
    for (i in m) print i, m[i]
    print "\n=== collect names using modules:"
    for (i in n) print i, n[i]
    print "\n=== count names using modules:"
    for (i in nc) print i, nc[i]
}
' logfile.txt

설명하다:

{ m[$(NF-1)]++ }- 입력 데이터에서 두 번째 필드(모듈)의 카운터를 증가시킵니다.
{ n[$(NF-1)] = n[$(NF-1)] " " $NF }- 각 키(모듈)의 마지막 필드(이름)를 연결합니다.
{ nc[$(NF-1),$NF]++ }- (이름, 모듈) 키 튜플의 카운터 증가

예제 데이터를 사용하면 다음과 같은 출력이 생성됩니다.

=== count modules:
"rarely_used_module": 1
"different_module": 2
"certain_module": 3

=== collect names using modules:
"rarely_used_module":  [email protected]
"different_module":  [email protected] [email protected]
"certain_module":  [email protected] [email protected] [email protected]

=== count names using modules:
"different_module", [email protected]: 1
"different_module", [email protected]: 1
"certain_module", [email protected]: 2
"rarely_used_module", [email protected]: 1
"certain_module", [email protected]: 1

Answer

업데이트된 질문의 경우:

awk '
/"rarely_used_module"/ && /OUT:/ { nc[$NF]++ ; c++ }
END {
    printf "Number of license checkouts for rarely_used_module: %d\n", c
    for (i in nc) printf "User: %s (%d)\n", i, nc[i]
}
' logfile.txt

다음 출력을 생성합니다.

Number of license checkouts for rarely_used_module: 4
User: [email protected] (2)
User: [email protected] (2)

요구 사항이 증가할 경우 코드를 확장하는 방법을 보여주기 위해 아래에 원래 답변을 남겨 두었습니다.

다음은 다음을 사용하여 이러한 작업을 수행하는 방법의 예입니다 awk.

awk '
BEGIN { SUBSEP = ", " ; OFS = ": " }
{ m[$(NF-1)]++ }
{ n[$(NF-1)] = n[$(NF-1)] " " $NF }
{ nc[$(NF-1),$NF]++ }
END {
    print "\n=== count modules:"
    for (i in m) print i, m[i]
    print "\n=== collect names using modules:"
    for (i in n) print i, n[i]
    print "\n=== count names using modules:"
    for (i in nc) print i, nc[i]
}
' logfile.txt

설명하다:

{ m[$(NF-1)]++ }- 입력 데이터에서 두 번째 필드(모듈)의 카운터를 증가시킵니다.
{ n[$(NF-1)] = n[$(NF-1)] " " $NF }- 각 키(모듈)의 마지막 필드(이름)를 연결합니다.
{ nc[$(NF-1),$NF]++ }- (이름, 모듈) 키 튜플의 카운터 증가

예제 데이터를 사용하면 다음과 같은 출력이 생성됩니다.

=== count modules:
"rarely_used_module": 1
"different_module": 2
"certain_module": 3

=== collect names using modules:
"rarely_used_module":  [email protected]
"different_module":  [email protected] [email protected]
"certain_module":  [email protected] [email protected] [email protected]

=== count names using modules:
"different_module", [email protected]: 1
"different_module", [email protected]: 1
"certain_module", [email protected]: 2
"rarely_used_module", [email protected]: 1
"certain_module", [email protected]: 1

Question 2

모든 줄을 변경하거나 일치시키는 것보다 더 복잡한 것이 필요할 때 Python은 범용 언어이기 때문에 사용합니다. awk(btw, Python awk가 있습니다 ) 보다 더 장황할 수도 있지만 pawk잘 문서화되어 있고 쉽게 확장할 수 있는 코드도 제공합니다.

다음은 작업에 적합한 Python 2 스크립트입니다.

from collections import defaultdict

FILE = 'module.txt'

# Global table of usages is 
# dict [ module_name ] -> dict [ user_name ] -> count
usage = defaultdict(lambda : defaultdict(int))

# Read, parse data and add usage count where needed
with open(FILE) as f:
    for line in f:
        # Split using spaces and pick last 2 fields, 
        # strip unncessary characters
        fields = line.split()     
        user = fields[-1].rstrip()
        module_name = fields[-2].strip('"')

        usage[module_name][user] += 1

# Now print pretty results
for module_name, module_usage in usage.items():
    print '====> ', module_name
    for user, count in module_usage.items():
        print '\t', user, count

샘플에 대해 다음 데이터가 인쇄됩니다.

====>  different_module                                                                                                                                                        
        [email protected] 1
        [email protected] 1
====>  rarely_used_module
        [email protected] 1
====>  certain_module
        [email protected] 2
        [email protected] 1

Answer

모든 줄을 변경하거나 일치시키는 것보다 더 복잡한 것이 필요할 때 Python은 범용 언어이기 때문에 사용합니다. awk(btw, Python awk가 있습니다 ) 보다 더 장황할 수도 있지만 pawk잘 문서화되어 있고 쉽게 확장할 수 있는 코드도 제공합니다.

다음은 작업에 적합한 Python 2 스크립트입니다.

from collections import defaultdict

FILE = 'module.txt'

# Global table of usages is 
# dict [ module_name ] -> dict [ user_name ] -> count
usage = defaultdict(lambda : defaultdict(int))

# Read, parse data and add usage count where needed
with open(FILE) as f:
    for line in f:
        # Split using spaces and pick last 2 fields, 
        # strip unncessary characters
        fields = line.split()     
        user = fields[-1].rstrip()
        module_name = fields[-2].strip('"')

        usage[module_name][user] += 1

# Now print pretty results
for module_name, module_usage in usage.items():
    print '====> ', module_name
    for user, count in module_usage.items():
        print '\t', user, count

샘플에 대해 다음 데이터가 인쇄됩니다.

====>  different_module                                                                                                                                                        
        [email protected] 1
        [email protected] 1
====>  rarely_used_module
        [email protected] 1
====>  certain_module
        [email protected] 2
        [email protected] 1

특정 문구가 파일에 나타나는 횟수를 계산하고 깔끔하게 서식을 지정하려면 어떻게 해야 합니까?

업데이트 #1

답변1

답변2

관련 정보