AWK를 사용하여 타임스탬프별로 다른 중복 항목 필터링

Question 1

$ tac file | awk '!seen[substr($0,1,length()-25)]++'
archive-daily/document-deb-report-2022-07-18-10-04-21.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-sell-report-2022-07-13-23-15-34.html

Answer

$ tac file | awk '!seen[substr($0,1,length()-25)]++'
archive-daily/document-deb-report-2022-07-18-10-04-21.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-sell-report-2022-07-13-23-15-34.html

Question 2

사용 sed및tac

$ sed -En 'G;/^(([^-]*-){3}).*\n.*\n\1/d;H;P' <(tac input_file)
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html

Answer

사용 sed및tac

$ sed -En 'G;/^(([^-]*-){3}).*\n.*\n\1/d;H;P' <(tac input_file)
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html

Question 3

AWK 사용: 프로그램은 이전 파일의 접두사를 저장하고 현재 파일의 접두사를 가져와 비교하며 접두사가 변경되면 이전 파일 이름을 인쇄합니다.

# myuniq.awk

BEGIN {
    last_prefix = 0
    last_line = 0
}

{
    if (match($0, /(-[[:digit:]]+){6}\.html$/) == 0)
        next

    prefix = substr($0, 1, RSTART - 1)
    if (last_prefix != 0 && prefix != last_prefix)
        print last_line

    last_prefix = prefix
    last_line = $0
}

END {
    if (last_line != 0)
        print last_line
}

$ cat files.txt
archive-daily/document-sell-report-2022-07-12-23-21-02.html
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-05-12-16.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-13-17-40.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html
$ awk -f myuniq.awk < files.txt
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html

Answer

AWK 사용: 프로그램은 이전 파일의 접두사를 저장하고 현재 파일의 접두사를 가져와 비교하며 접두사가 변경되면 이전 파일 이름을 인쇄합니다.

# myuniq.awk

BEGIN {
    last_prefix = 0
    last_line = 0
}

{
    if (match($0, /(-[[:digit:]]+){6}\.html$/) == 0)
        next

    prefix = substr($0, 1, RSTART - 1)
    if (last_prefix != 0 && prefix != last_prefix)
        print last_line

    last_prefix = prefix
    last_line = $0
}

END {
    if (last_line != 0)
        print last_line
}

$ cat files.txt
archive-daily/document-sell-report-2022-07-12-23-21-02.html
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-05-12-16.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-13-17-40.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html
$ awk -f myuniq.awk < files.txt
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html

Question 4

사용행복하다(이전 Perl_6)

~$ raku -e 'my @a = lines>>.split(/ "/" | [ <?after report> \- ]/);  \
            my %h.=append: [Z=>] @a>>[1], @a.map(*.[2]);  \
            for %h.sort {say (.key, .value.max).join("-")};'   file.txt

이 답변은 질문을 약간 단순화하고 분석해야 하는 모든 파일이 동일한 디렉터리에 있다고 가정합니다. 따라서 split첫 번째 단계는 디렉토리를 닫는 것입니다. 디렉터리를 포함하면 코드가 제대로 실행된다는 점에 유의하세요(길이가 길어질 뿐입니다 key). 알파벳 "보고서" 순서로 반환된 샘플 출력:

입력 예:

archive-daily/document-sell-report-2022-07-12-23-21-02.html
archive-daily/document-sell-report-2022-07-13-23-15-34.html
archive-daily/document-loan-report-2022-07-18-05-12-16.html
archive-daily/document-loan-report-2022-07-18-17-07-26.html
archive-daily/document-deb-report-2022-07-18-13-17-40.html
archive-daily/document-deb-report-2022-07-18-10-04-21.html

예제 출력:

document-deb-report-2022-07-18-13-17-40.html
document-loan-report-2022-07-18-17-07-26.html
document-sell-report-2022-07-13-23-15-34.html

첫 번째 명령문에서는 "report"라는 단어 뒤의 슬래시나 하이픈 lines으로 읽어서 파괴적으로 분할한 후 배열에 저장합니다. 해시를 선언하고 해시에 세 가지 요소를 추가합니다. 각 "Zip-reduction"은 해당 데이터 저장소를 꺼내고 키-값 관계에 "fat-arrow"를 추가합니다. 따라서 복합 메타 연산자를 사용합니다 . 따라서 첫 번째 요소는 두 번째 요소( )가 됩니다. 그런 다음 값을 계산하고 결과를 반환합니다./-@a%happend[Z]=>[Z=>]keyvaluemaxjoin

이것이 흥미로워지는 곳입니다. Raku에는 ISO-8601DateTimes가 내장되어 있으므로 두 번째 요소를 객체 subst로 인식하도록 교체할 수 있습니다 ! 따라서 실제 결과를 DateTime얻을 수 있습니다 .maxDateTime

~$ raku -e 'my @a = lines>>.split(/ "/" | [ <?after report> \- ]/);  \
            my %h.=append: [Z=>] @a>>[1], @a.map(*.[2].subst(/ \- (\d**2) \- (\d**2) \- (\d**2) \.html $/, {"T$0:$1:$2"} ).DateTime); \
            "".put; for %h.sort {say (.key => .value.max)};'  file.txt

document-deb-report => 2022-07-18T13:17:40Z
document-loan-report => 2022-07-18T17:07:26Z
document-sell-report => 2022-07-13T23:15:34Z

자세한 내용은 아래를 참조하세요. 모든 보고서 는 max날짜/시간을 반환합니다. 이것은 단지 OP의 데이터가 잘못된 것입니다(@QuartzCristal이 지적했듯이).

https://docs.raku.org/언어/hashmap#Mutable_hashes_and_immutable_maps
https://docs.raku.org/언어/operators#index-entry-[]_(reduction_metaoperators)
https://docs.raku.org/type/DateTime
https://raku.org

Answer