한 파일의 한 줄에 있는 문자열을 다른 파일에서 삭제해야 하는 Perl 스크립트 관련 문제

Question 1

귀하가 요청한 문제 외에도 귀하의 스크립트에는 "remove.txt"의 모든 줄에 대해 "foo"를 그대로 전달한다는 점에서 큰 결함이 있습니다. 이는 매우 비효율적입니다. 더 나은 접근 방식은 "remove.txt"를 읽고 긴 정규식을 구성한 다음 이를 한 번 사용하여 "foo"를 편집하는 것입니다.

가장 간단한 방법은 검색 문자열을 배열로 푸시한 다음 "|" "join()" 배열(regex "or") 문자를 사용하여 정규식으로 사용할 수 있는 문자열을 만드는 것입니다.

다음은 이 작업을 수행하고 원래 문제를 해결하는 스크립트입니다.

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Answer

귀하가 요청한 문제 외에도 귀하의 스크립트에는 "remove.txt"의 모든 줄에 대해 "foo"를 그대로 전달한다는 점에서 큰 결함이 있습니다. 이는 매우 비효율적입니다. 더 나은 접근 방식은 "remove.txt"를 읽고 긴 정규식을 구성한 다음 이를 한 번 사용하여 "foo"를 편집하는 것입니다.

가장 간단한 방법은 검색 문자열을 배열로 푸시한 다음 "|" "join()" 배열(regex "or") 문자를 사용하여 정규식으로 사용할 수 있는 문자열을 만드는 것입니다.

다음은 이 작업을 수행하고 원래 문제를 해결하는 스크립트입니다.

#! /usr/bin/perl 

use strict;
use warnings;

# first construct a regular expression containing every
# line that needs to be removed.  This is so we only have
# to run a single pass through $infile rather than one
# pass per line in $removefile.
my @remove = ();

my $removefile='remove.txt';
open(REMFILE,"<",$removefile) || die "couldn't open $removefile: $!\n";
while(<REMFILE>) {
    chomp;
    next if (/^\s*$/);
    push @remove, $_;
};
close(REMFILE);

# choose one of the following two lines depending on
# whether you want to remove only entire lines or text
# within a line:
my $remove = '^(' . join("|",@remove) . ')$';
#my $remove = join("|",@remove);

# now remove the unwanted text from all lines in $infile
my $infile = 'foo';
system('perl','-p','-i','-e',"s/$remove//g",$infile);

# if you want to delete matching lines, try this instead:
#system('perl','-n','-i','-e',"print unless /$remove/",$infile);

Question 2

에서 qq()정규식 메타 문자( (및 )) 를 사용하고 이스케이프해야 합니다 $bad_string.

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Answer

에서 qq()정규식 메타 문자( (및 )) 를 사용하고 이스케이프해야 합니다 $bad_string.

            my $bad_string = "\\($line\\)";
            system( qq( perl -p -i -e 's/$bad_string//g' foo ) );

Question 3

귀하의 질문에는 3가지 요소가 있습니다.

"제외 목록"을 작성하십시오. 제외 목록의 "특수" 문자는 문제를 일으킬 수 있습니다.
파일을 읽고 "일치"하는 경우 줄을 제외하세요.
새 파일을 작성하세요.

귀하의 질문에는 "나쁜 스타일"이라고 부르는 일이 진행되고 있다고 생각합니다.

3인수 어휘 파일 핸들을 여는 것은 좋은 스타일입니다.
내부에서 호출하는 것은 system비효율적입니다. perlperl
참조 보간은 번거로우므로 피하는 것이 가장 좋습니다.
출력 파일을 반복적으로 재처리하고 있는데 이는 매우 비효율적입니다. (기억하세요 - 디스크 IO는 시스템에서 수행할 수 있는 가장 느린 작업입니다.)

이를 염두에 두고 다음과 같이 하겠습니다.

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

(참고: 철저한 테스트는 아닙니다. 샘플 데이터를 제공할 수 있으면 필요에 따라 테스트/업데이트하겠습니다.)

Answer

귀하의 질문에는 3가지 요소가 있습니다.

"제외 목록"을 작성하십시오. 제외 목록의 "특수" 문자는 문제를 일으킬 수 있습니다.
파일을 읽고 "일치"하는 경우 줄을 제외하세요.
새 파일을 작성하세요.

귀하의 질문에는 "나쁜 스타일"이라고 부르는 일이 진행되고 있다고 생각합니다.

3인수 어휘 파일 핸들을 여는 것은 좋은 스타일입니다.
내부에서 호출하는 것은 system비효율적입니다. perlperl
참조 보간은 번거로우므로 피하는 것이 가장 좋습니다.
출력 파일을 반복적으로 재처리하고 있는데 이는 매우 비효율적입니다. (기억하세요 - 디스크 IO는 시스템에서 수행할 수 있는 가장 느린 작업입니다.)

이를 염두에 두고 다음과 같이 하겠습니다.

#!/usr/bin/env perl
use strict;
use warnings;

my $infile = "remove.txt";
open( my $pattern_fh, '<', $infile ) or die "cannot open $infile $!";

#quotemeta escapes meta characters that'll break your pattern matching. 
my $regex = join( '|', map {quotemeta} <$pattern_fh> );
#compile the regex
$regex = qr/^($regex)$/;    #whole lines
close($input_fh);

print "Using regular expression: $regex\n"; 

open( my $input_fh,  '<', "foo" )     or die $!;
open( my $output_fh, '>', "foo.new" ) or die $!;

#tell print where to print by default. 
#could instead print {$output_fh} $_; 
select($output_fh);
while (<$input_fh>) {
    print unless m/$regex/;
}
close($input_fh);
close($output_fh);

#rename/copy if it worked

(참고: 철저한 테스트는 아닙니다. 샘플 데이터를 제공할 수 있으면 필요에 따라 테스트/업데이트하겠습니다.)

한 파일의 한 줄에 있는 문자열을 다른 파일에서 삭제해야 하는 Perl 스크립트 관련 문제

답변1

답변2

답변3

관련 정보