두 파일 비교

Question 1

파일이 모두 정렬되었다고 가정합니다.

join -j1 -t\| entity.txt reference.txt

정렬되지 않은 경우 정렬합니다.

sort entity.txt -o entity-sorted.txt
sort reference.txt -o reference-sorted.txt
join -j1 -t\| entity-sorted.txt reference-sorted.txt

Answer

파일이 모두 정렬되었다고 가정합니다.

join -j1 -t\| entity.txt reference.txt

정렬되지 않은 경우 정렬합니다.

sort entity.txt -o entity-sorted.txt
sort reference.txt -o reference-sorted.txt
join -j1 -t\| entity-sorted.txt reference-sorted.txt

Question 2

bash/zsh 단일 라이너를 사용하여 이 작업을 수행할 수 있습니다. 데이터가 entityand 라는 파일 에 포함되어 있다고 가정 reference하고 다음을 입력하세요.

for i in $(cat entity); do grep ^$i reference; done

콘솔에서.

output또한 다음과 같이 전체 출력을 파일로 리디렉션할 수 있습니다.

for i in $(cat entity); do grep ^$i reference; done > output

Answer

bash/zsh 단일 라이너를 사용하여 이 작업을 수행할 수 있습니다. 데이터가 entityand 라는 파일 에 포함되어 있다고 가정 reference하고 다음을 입력하세요.

for i in $(cat entity); do grep ^$i reference; done

콘솔에서.

output또한 다음과 같이 전체 출력을 파일로 리디렉션할 수 있습니다.

for i in $(cat entity); do grep ^$i reference; done > output

Question 3

솔루션 활용진주:

콘텐츠엔터티.txt:

$ cat entity.txt
624197
624252
624264
624276
624280
624309
624317

콘텐츠참조.txt:

$ cat reference.txt 
624252|624346
624264|1070122
624264|624346
624276|624588
624280|624346
624280|624582
624298|624588
624319|333008
624330|624588

Perl 스크립트의 내용:

$ cat script.pl
use warnings;
use strict;

## Check arguments.
@ARGV == 2 or die qq[Usage: perl $0 <entity-file> <reference-file>\n];

## File in process.
my $process_file = 1;

## Hash to save entities.
my %entity;


while ( <> ) {
        ## Process file of entities. Remove leading and trailing spaces, and save the
        ## number to a hash.
        if ( $process_file == 1 ) {
                s/\A\s*//;
                s/\s*\z//;
                if ( defined $_ ) { $entity{ $_ } = 1 }
                next;
        }

        ## Process file of references. Get first field and search it in the hash.
        ## If found, print the line.
        my @f = split /\|/, $_, 2;
        if ( exists $entity{ $f[0] } ) {
                print;
        }

} continue {
        ## Increment number when end processing first file.
        if ( eof ) { ++$process_file }
}

매개변수 없이 스크립트를 실행합니다.

$ perl script.pl
Usage: perl script.pl <entity-file> <reference-file>

매개변수 및 결과를 사용하여 스크립트를 실행합니다.

$ perl script.pl entity.txt reference.txt 
624252|624346
624264|1070122
624264|624346
624276|624588
624280|624346
624280|624582

Answer