중복된 키가 있는 레코드 찾기

Question 1

테스트 소스 파일을 만들었습니다.

assocId=1
IMPI=XXX
IMPU=YYY
MSISDN=ZZZ
IMSI=PPP
assocId=2
IMPI=ddd
IMPI=eee
IMPU=fff
IMPU=ggg
IMSI=hhh
IMSI=iii
MSISDN=jjj
MSISDN=kkk
assocId=3
IMPI=XXX
IMPU=YYY
MSISDN=ZZZ
IMSI=PPP
assocId=4
IMPI=ddd
IMPI=eee
IMPU=fff
IMPU=ggg
IMSI=hhh
IMSI=iii
MSISDN=jjj
MSISDN=kkk

그런 다음 다음 GAWK 스크립트를 작성했습니다.

#!/usr/bin/gawk -f
#
# Define the processing for a change of associd.
#
# NB: This function uses the GLOBAL variables:
#       IMPI
#       IMPU
#       IMSI
#       MSISDN
#
function new_assoc(assoc,     flag) {
        flag = 0
        if (IMPI > 1) flag=1
        if (IMPU > 1) flag=1
        if (IMSI > 1) flag=1
        if (MSISDN > 1) flag=1
        if (flag > 0) printf( "Found a multiple entry: %d\n", assoc )
        IMPI = IMPU = IMSI = MSISDN = 0
}
#
#       First thing, set up the field seperator.
#
BEGIN {
        FS = "="
}
#
#       Every time we hit an assoc line handle the previous one and then
#       initialise.
#
/^assocId/ {
        new_assoc( assoc )
        assoc = $2
}
#
#       Total up the four entries:
#
/^IMPI/   { IMPI++   }
/^IMPU/   { IMPU++   }
/^IMSI/   { IMSI++   }
/^MSISDN/ { MSISDN++ }
#
#       Ensure we process the last assoc on EOF:
#
END {
        new_assoc( assoc )
}

내가 그것을 실행할 때 :

$ ./scan_it <src
Found a multiple entry: 2
Found a multiple entry: 4

이것이 당신이 해야 할 일의 기초가 되기를 바랍니다.

Answer

테스트 소스 파일을 만들었습니다.

assocId=1
IMPI=XXX
IMPU=YYY
MSISDN=ZZZ
IMSI=PPP
assocId=2
IMPI=ddd
IMPI=eee
IMPU=fff
IMPU=ggg
IMSI=hhh
IMSI=iii
MSISDN=jjj
MSISDN=kkk
assocId=3
IMPI=XXX
IMPU=YYY
MSISDN=ZZZ
IMSI=PPP
assocId=4
IMPI=ddd
IMPI=eee
IMPU=fff
IMPU=ggg
IMSI=hhh
IMSI=iii
MSISDN=jjj
MSISDN=kkk

그런 다음 다음 GAWK 스크립트를 작성했습니다.

#!/usr/bin/gawk -f
#
# Define the processing for a change of associd.
#
# NB: This function uses the GLOBAL variables:
#       IMPI
#       IMPU
#       IMSI
#       MSISDN
#
function new_assoc(assoc,     flag) {
        flag = 0
        if (IMPI > 1) flag=1
        if (IMPU > 1) flag=1
        if (IMSI > 1) flag=1
        if (MSISDN > 1) flag=1
        if (flag > 0) printf( "Found a multiple entry: %d\n", assoc )
        IMPI = IMPU = IMSI = MSISDN = 0
}
#
#       First thing, set up the field seperator.
#
BEGIN {
        FS = "="
}
#
#       Every time we hit an assoc line handle the previous one and then
#       initialise.
#
/^assocId/ {
        new_assoc( assoc )
        assoc = $2
}
#
#       Total up the four entries:
#
/^IMPI/   { IMPI++   }
/^IMPU/   { IMPU++   }
/^IMSI/   { IMSI++   }
/^MSISDN/ { MSISDN++ }
#
#       Ensure we process the last assoc on EOF:
#
END {
        new_assoc( assoc )
}

내가 그것을 실행할 때 :

$ ./scan_it <src
Found a multiple entry: 2
Found a multiple entry: 4

이것이 당신이 해야 할 일의 기초가 되기를 바랍니다.

Question 2

다음 프로그램은 중복 키가 포함된 모든 레코드의 ID를 awk출력합니다 . assocId코드는 논리적으로 다음과 거의 동일합니다.Martin의 답변 코드, 중복된 항목을 찾고 있습니다.어느기록의 열쇠입니다.

BEGIN { FS = "=" }

function validate() {
    # Outputs a message if any key in "keys" is associated
    # with a number greater than 1.

    for (key in keys)
        if (keys[key] > 1) {
            printf "Check assocId=%s\n", id
            break
        }
}

/^assocId=/ {
    # New record.
    # Validate the previous record and delete the count of keys.
    validate()
    id = $2
    delete keys
}

{
    # Increment the counter for this key.
    keys[$1]++
}

END {
    # Validate the last record.
    validate()
}

읽을 수 없는 한 줄의 시로서:

awk -F = 'function v(){for(k in c)if(c[k]>1){printf "Check assocId=%s\n",id;break}}/^assocId=/{v();id=$2;delete c}{c[$1]++}END{v()}'

달리다Martin이 사용한 것과 동일한 테스트 데이터, 다음과 같은 출력을 얻게 됩니다:

Check assocId=2
Check assocId=4

Answer

다음 프로그램은 중복 키가 포함된 모든 레코드의 ID를 awk출력합니다 . assocId코드는 논리적으로 다음과 거의 동일합니다.Martin의 답변 코드, 중복된 항목을 찾고 있습니다.어느기록의 열쇠입니다.

BEGIN { FS = "=" }

function validate() {
    # Outputs a message if any key in "keys" is associated
    # with a number greater than 1.

    for (key in keys)
        if (keys[key] > 1) {
            printf "Check assocId=%s\n", id
            break
        }
}

/^assocId=/ {
    # New record.
    # Validate the previous record and delete the count of keys.
    validate()
    id = $2
    delete keys
}

{
    # Increment the counter for this key.
    keys[$1]++
}

END {
    # Validate the last record.
    validate()
}

읽을 수 없는 한 줄의 시로서:

awk -F = 'function v(){for(k in c)if(c[k]>1){printf "Check assocId=%s\n",id;break}}/^assocId=/{v();id=$2;delete c}{c[$1]++}END{v()}'

달리다Martin이 사용한 것과 동일한 테스트 데이터, 다음과 같은 출력을 얻게 됩니다:

Check assocId=2
Check assocId=4

Question 3

이전 솔루션과 유사합니다.

function count() {
    if (impi > 1) {
        print associd, "with impi repeated ", impi, "times"
    }
    
    if (impu > 1) {
        print associd, "with impu repeated ", impu, "times"
    }

    if (msisdn > 1) {
        print associd, "with msisdn repeated ", msisdn, "times"
    }
}

/assocId/ {
    count()
    impi = 0
    impu = 0
    msisdn = 0
    associd = $0
}

/IMPI/ {
    impi += 1
}

/IMPU/ {
    impu += 1
}

/MSISDN/ {
    msisdn += 1
}

END {
    count()
}

assocId=2 with impi repeated  2 times
assocId=2 with impu repeated  2 times
assocId=2 with msisdn repeated  2 times
assocId=4 with impi repeated  2 times
assocId=4 with impu repeated  2 times
assocId=4 with msisdn repeated  2 times

하지만 한 번만 호출할 수 있는 메서드를 갖고 싶습니다 count.

Answer

이전 솔루션과 유사합니다.

function count() {
    if (impi > 1) {
        print associd, "with impi repeated ", impi, "times"
    }
    
    if (impu > 1) {
        print associd, "with impu repeated ", impu, "times"
    }

    if (msisdn > 1) {
        print associd, "with msisdn repeated ", msisdn, "times"
    }
}

/assocId/ {
    count()
    impi = 0
    impu = 0
    msisdn = 0
    associd = $0
}

/IMPI/ {
    impi += 1
}

/IMPU/ {
    impu += 1
}

/MSISDN/ {
    msisdn += 1
}

END {
    count()
}

assocId=2 with impi repeated  2 times
assocId=2 with impu repeated  2 times
assocId=2 with msisdn repeated  2 times
assocId=4 with impi repeated  2 times
assocId=4 with impu repeated  2 times
assocId=4 with msisdn repeated  2 times

하지만 한 번만 호출할 수 있는 메서드를 갖고 싶습니다 count.

중복된 키가 있는 레코드 찾기

답변1

답변2

답변3

관련 정보