파일이 있는데 파일에 동일한 항목이 있는지 알고 싶습니다.
파일에는 다음 항목이 포함되어 있습니다.
dn: cn=ccb2fa1a-6efb-4f29-b18b-72e226d76935,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=936480,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>70
dn: cn=715f55d1-e940-42f9-8ae5-25ff1eff6f55,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=7292,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>4024
rdcPosition: cn=8910,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>5209
rdcPosition: cn=7263,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>6725
rdcPosition: cn=936480,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>11
dn: cn=f61e2769-a9c8-486a-914b-92333055b5e5,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=938936,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>74
rdcPosition: cn=942380,ou=Entities,ou=Active,ou=Vault,o=rdc#5#<position><cn>51
dn: cn=7548d048-1288-4b66-97f4-efe15c68fc50,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=311432,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>43
dn: cn=e51f3d78-b9d8-4bcf-b8c5-321519f19515,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=938936,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>35
dn: cn=cf6ddfb2-4261-4169-9e6e-0d6963262b49,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=938936,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>82
"dn:" 아래의 "rdcPosition" 줄에 중복된 항목이 있는지 알아야 합니다. 예를 들면 다음과 같습니다.
dn: cn=65fb5990-4d2f-492e-83fb-c2cbd72d8988,ou=Named,ou=Identities,ou=Active,o
rdcPosition: cn=7688,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>2323
rdcPosition: cn=7688,ou=Entities,ou=Active,ou=Vault,o=rdc#3#<position><cn>2323
어떤 Unix 명령을 사용해야 하는지 아시나요?
답변1
내가 매일 작성하는 빠른 스크립트 유형은 다음과 같습니다.
#!/usr/bin/perl
#
use strict;
use warnings;
#data structures we're gonna need
my %positions; #how many times have we seen a given position
my %registered_lines; #the concatenated lines for the given position
my $dn; # the current dn section we're in
while (<>)
{
if (/^dn:/) #beginning of a new dn section (and end of the previous one)
{
my $printed = 0; #we want to print the dn line only once
foreach my $key (keys %positions) #we look at all positions seen in last section
{
if ($positions{$key} gt 1) # has the current position been seen more than once
{
print $dn unless $printed;
$printed = 1;
#print "position $key is repeated $positions{$key} times\n";
print $registered_lines{$key}; #print all the lines with the position
}
}
#reset variables for the next section
$dn = $_;
%positions = ();
%registered_lines = ();
}
if (/^rdcPosition/) #new line
{
/(\d+)$/; #have a look at the digits at the end of the line
my $pos = $1;
if (exists $positions{$pos}) #have we already seen this position
{
$positions{$pos} += 1; #increment the counter
$registered_lines{$pos} .= $_; #record the line
}
else
{
$positions{$pos} = 1;
$registered_lines{$pos} = $_;
}
}
}
다음과 같이 실행하세요:
perl script.pl < input_data_file
답변2
"중복 항목이 있습니까?"에 관심이 있다면 cat <file> | sort | wc -l
합계 결과를 비교하는 것이 좋습니다 cat <file> | sort | uniq | wc -l
. 중복된 내용이 있는 경우 uniq
삭제되고 개수가 줄어듭니다. 차이점을 확인하려면 @Igeorget이 게시한 Perl 스크립트를 확인하세요.
답변3
awk '/^dn:/ {d=1} {if (d) {print buf | "sort|uniq -d"; d=0; buf=""} else {buf=buf$0"\n"}} END {print buf | "sort|uniq -d"}'|grep -v '^$'
Perl 버전보다 훨씬 적은 입력 =). 더 간단할 수도 있지만 "어떤 패턴에서나 마지막에" awk 규칙을 실행할 수 없는 것 같아서 약간의 쉘 코드 중복이 필요합니다.