grep은 파일에서 많은 패턴을 사용하고 파일을 다시 읽지 않고도 어떤 패턴이 어떤 파일과 일치하는지 보여줍니다.

Question

원하는 경우 grep가장 좋은 방법은 첫 번째 일치에서 현재 입력 파일 읽기를 중지하는 -m 1옵션을 사용하는 것입니다. grep여전히 각 입력 파일을 여러 번 읽게 되지만(각 패턴에 대해 한 번씩) 더 빠릅니다(일치 항목이 파일의 마지막 줄에 있거나 그 근처에 있지 않는 한).

예를 들어

#!/bin/bash

# Speed up each grep by exiting on 1st match with -m 1
#
# This still reads each file multiple times, but should run faster because it
# won't read the entire file each time unless the match is on the last line.
#
# Also reduce repetitive code by using an array and a for loop iterating over
# the indices of the array, rather than the values

patterns=(pattern1 pattern2 pattern3 patternN)

# iterate over the indices of the array (with `${!`), not the values.
for p in "${!patterns[@]}"; do
  # escape forward- and back- slashes in pattern
  esc=$(echo "${patterns[$p]}" | sed -e 's:/:\\/:g; s:\\:\\\\:g')
  grep -liE -m 1 "${patterns[$p]}" file_glob* |
    sed -e "s/^/$esc\t/" > "temp_pattern$(($p+1)).txt"
done

참고: 이는 $p+1bash 배열이 0 기반이기 때문에 존재합니다. +1 temp_patterns 파일을 1에서 시작하도록 만듭니다.

너할 수 있는awk또는 같은 스크립트 언어를 사용하면 원하는 것을 할 수 있습니다 perl. 예를 들어, 다음 Perl 스크립트는 각 입력 파일을 한 번만 읽고 파일에 아직 존재하지 않는 모든 패턴에 대해 각 줄을 확인합니다. 특정 파일에서 표시된 패턴을 추적하고(배열 사용 @seen) 사용 가능한 모든 패턴이 파일에서 표시되면(배열 사용 @seen) 이를 확인하고 이 경우 현재 파일을 닫습니다.

#!/usr/bin/perl
use strict;

# array to hold the patterns
my @patterns = qw(pattern1 pattern2 pattern3 patternN);

# Array-of-Arrays (AoA, see man pages for perllol and perldsc)
# to hold matches
my @matches;

# Array for keeping track of whether current pattern has
# been seen already in current file
my @seen;

# read each line of each file
while(<>) {
  # check each line against all patterns that haven't been seen yet
  for my $i (keys @patterns) {
    next if $seen[$i];
    if (m/$patterns[$i]/i) {
      # add the pattern and the filename to the @matches AoA
      push @{ $matches[$i] }, "$patterns[$i]\t$ARGV";
      $seen[$i] = 1;
    }
  };

  # handle end-of-file AND having seen all patterns in a file
  if (eof || $#seen == $#patterns) {
    #print "closing $ARGV on line $.\n" unless eof;
    # close the current input file.  This will have
    # the effect of skipping to the next file.
    close(ARGV);
    # reset seen array at the end of every input file
    @seen = ();
  };
}

# now create output files
for my $i (keys @patterns) {
  #next unless @{ $matches[$i] }; # skip patterns with no matches
  my $outfile = "temp_pattern" . ($i+1) . ".txt";
  open(my $out,">",$outfile) || die "Couldn't open output file '$outfile' for write: $!\n";
  print $out join("\n", @{ $matches[$i] }), "\n";
  close($out);
}

이 if (eof || $#seen == $#patterns)줄은 현재 파일의 eof(파일 끝)를 테스트합니다.또는현재 파일에서 사용 가능한 모든 패턴을 본 경우(즉, @seen의 요소 수는 @patterns의 요소 수와 같습니다).

두 경우 모두 @seen 배열을 빈 상태로 재설정하여 다음 입력 파일을 준비할 수 있도록 합니다.

후자의 경우 현재 입력 파일을 일찍 닫고 싶습니다. 파일의 나머지 부분을 계속 읽고 처리하지 않고도 보고 싶은 모든 것을 이미 보았습니다.

그런데, 빈 파일을 만들고 싶지 않다면(즉, 패턴이 일치하지 않는 경우) next unless @{ $matches[$i] }for 루프 출력에서 해당 줄의 주석 처리를 제거하세요.

임시 파일이 필요하지 않고 모든 일치 항목을 파일로 출력하려면 for 루프의 최종 출력을 다음으로 바꾸십시오.

for my $i (keys @patterns) {
  #next unless @{ $matches[$i] }; # skip patterns with no matches
  print join("\n", @{ $matches[$i] }), "\n";
}

출력을 파일로 리디렉션합니다.

그런데 파일에서 패턴이 처음 나타나는 줄 번호를 추가하려면 다음을 변경하십시오.

push @{ $matches[$i] }, "$patterns[$i]\t$ARGV";

도착하다

push @{ $matches[$i] }, "$patterns[$i]\t$.\t$ARGV";

$.입력의 현재 줄 번호를 보유하는 내장 Perl 변수입니다 <>. 현재 파일( )이 닫힐 때마다 ARGV0으로 재설정됩니다 .

Answer 1