2017/01/30 수정

Question 1

나는 perl다음과 같은 것을 사용할 것이다:

perl -MFile::Find -MClone=clone -lne '
  # parse the strings.txt input, here looking for the sequences of
  # 0 or more characters (.*?) in between two " characters
  for (/"(.*?)"/g) {
    # @needle is an array of associative arrays whose keys
    # are the "strings" for each line.
    $needle[$n]{$_} = undef;
  }
  $n++;

  END{
    sub wanted {
      return unless -f; # only regular files
      my $needle_clone = clone(\@needle);
      if (open FILE, "<", $_) {
        LINE: while (<FILE>) {
          # read the file line by line
          for (my $i = 0; $i < $n; $i++) {
            for my $s (keys %{$needle_clone->[$i]}) {
              if (index($_, $s)>=0) {
                # if the string is found, we delete it from the associative
                # array.
                delete $needle_clone->[$i]{$s};
                unless (%{$needle_clone->[$i]}) {
                  # if the associative array is empty, that means we have
                  # found all the strings for that $i, that means we can
                  # stop processing, and the file matches
                  print $File::Find::name;
                  last LINE;
                }
              }
            }
          }
        }
        close FILE;
      }
    }
    find(\&wanted, ".")
  }' /path/to/strings.txt

이는 문자열 검색 횟수를 최소화한다는 의미입니다.

여기서는 파일을 한 줄씩 처리합니다. 파일이 매우 작은 경우 전체적으로 처리하면 작업이 약간 단순화되고 성능이 향상될 수 있습니다.

목록 파일은 다음 위치에 있을 것으로 예상됩니다.

 "surveillance data" "surveillance technology" "cctv camera"
 "social media" "surveillance techniques" "enforcement agencies"
 "social control" "surveillance camera" "social security"
 "surveillance data" "security guards" "social networking"
 "surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

형식에 따라 각 줄에는 따옴표(큰따옴표 포함)로 묶인 특정 수(반드시 3일 필요는 없음)의 문자열이 있습니다. 인용된 문자열 자체에는 큰따옴표 문자가 포함될 수 없습니다. 큰따옴표 문자는 검색 중인 텍스트의 일부가 아닙니다. 즉, 목록 파일에 다음이 포함된 경우:

"A" "B"
"1" "2" "3"

이는 현재 디렉토리와 다음 중 하나를 포함하는 그 아래의 모든 일반 파일에 대한 경로를 보고합니다.

둘다AB
또는 (아님독점 또는) 모두 1및23

그 어느 곳에서나.

Answer

나는 perl다음과 같은 것을 사용할 것이다:

perl -MFile::Find -MClone=clone -lne '
  # parse the strings.txt input, here looking for the sequences of
  # 0 or more characters (.*?) in between two " characters
  for (/"(.*?)"/g) {
    # @needle is an array of associative arrays whose keys
    # are the "strings" for each line.
    $needle[$n]{$_} = undef;
  }
  $n++;

  END{
    sub wanted {
      return unless -f; # only regular files
      my $needle_clone = clone(\@needle);
      if (open FILE, "<", $_) {
        LINE: while (<FILE>) {
          # read the file line by line
          for (my $i = 0; $i < $n; $i++) {
            for my $s (keys %{$needle_clone->[$i]}) {
              if (index($_, $s)>=0) {
                # if the string is found, we delete it from the associative
                # array.
                delete $needle_clone->[$i]{$s};
                unless (%{$needle_clone->[$i]}) {
                  # if the associative array is empty, that means we have
                  # found all the strings for that $i, that means we can
                  # stop processing, and the file matches
                  print $File::Find::name;
                  last LINE;
                }
              }
            }
          }
        }
        close FILE;
      }
    }
    find(\&wanted, ".")
  }' /path/to/strings.txt

이는 문자열 검색 횟수를 최소화한다는 의미입니다.

여기서는 파일을 한 줄씩 처리합니다. 파일이 매우 작은 경우 전체적으로 처리하면 작업이 약간 단순화되고 성능이 향상될 수 있습니다.

목록 파일은 다음 위치에 있을 것으로 예상됩니다.

 "surveillance data" "surveillance technology" "cctv camera"
 "social media" "surveillance techniques" "enforcement agencies"
 "social control" "surveillance camera" "social security"
 "surveillance data" "security guards" "social networking"
 "surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

형식에 따라 각 줄에는 따옴표(큰따옴표 포함)로 묶인 특정 수(반드시 3일 필요는 없음)의 문자열이 있습니다. 인용된 문자열 자체에는 큰따옴표 문자가 포함될 수 없습니다. 큰따옴표 문자는 검색 중인 텍스트의 일부가 아닙니다. 즉, 목록 파일에 다음이 포함된 경우:

"A" "B"
"1" "2" "3"

이는 현재 디렉토리와 다음 중 하나를 포함하는 그 아래의 모든 일반 파일에 대한 경로를 보고합니다.

둘다AB
또는 (아님독점 또는) 모두 1및23

그 어느 곳에서나.

Question 2

시스템에 존재하지 않는 것 같으 므로 agrepsed 및 awk 기반 대안을 확인하여 grep 및 로컬 파일 읽기를 적용하는 모드에서 작동하십시오.

PS: 귀하는 osx를 사용하고 있으므로 귀하가 사용하고 있는 awk 버전이 다음 사용법을 지원하는지 확실하지 않습니다.

awk다양한 AND 작업 모드를 사용하여 grep 사용을 시뮬레이션할 수 있습니다.
awk '/pattern1/ && /pattern2/ && /pattern3/'

따라서 다음과 같이 스키마 파일을 변환할 수 있습니다.

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

이와 관련하여:

$ sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' ./tmp/d1.txt
/surveillance data/ && /surveillance technology/ && /cctv camera/
/social media/ && /surveillance techniques/ && /enforcement agencies/
/social control/ && /surveillance camera/ && /social security/
/surveillance data/ && /security guards/ && /social networking/
/surveillance mechanisms/ && /cctv surveillance/ && /contemporary surveillance/

추신: 결국에는 출력을 다른 파일로 리디렉션하는 데 사용할 수 있습니다 >anotherfile. 또는 해당 sed -i옵션을 사용하여 동일한 검색어 패턴 파일에서 내부 변경을 수행할 수 있습니다.

그런 다음 이 스키마 파일에서 awk 형식의 스키마를 awk에 제공하면 됩니다.

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt #d1.txt = my test pattern file

또한 다음과 같이 원본 패턴 파일의 각 줄에 sed를 적용하여 원시 패턴 파일의 패턴을 변환할 수 없습니다.

while IFS= read -r line;do 
  line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line")
  awk "$line" *.txt
done <./tmp/d1.txt

또는 한 줄:

$ while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt

위 명령은 아래와 같이 테스트 파일에 올바른 AND 결과를 반환합니다.

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

결과:

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt
#or while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

업데이트:
위의 awk 솔루션은 일치하는 txt 파일의 내용을 인쇄합니다.
내용 대신 파일 이름을 표시하려면 필요한 경우 다음 awk를 사용하십시오.

awk "$line""{print FILENAME}" *.txt

Answer

시스템에 존재하지 않는 것 같으 므로 agrepsed 및 awk 기반 대안을 확인하여 grep 및 로컬 파일 읽기를 적용하는 모드에서 작동하십시오.

PS: 귀하는 osx를 사용하고 있으므로 귀하가 사용하고 있는 awk 버전이 다음 사용법을 지원하는지 확실하지 않습니다.

awk다양한 AND 작업 모드를 사용하여 grep 사용을 시뮬레이션할 수 있습니다.
awk '/pattern1/ && /pattern2/ && /pattern3/'

따라서 다음과 같이 스키마 파일을 변환할 수 있습니다.

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"

이와 관련하여:

$ sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' ./tmp/d1.txt
/surveillance data/ && /surveillance technology/ && /cctv camera/
/social media/ && /surveillance techniques/ && /enforcement agencies/
/social control/ && /surveillance camera/ && /social security/
/surveillance data/ && /security guards/ && /social networking/
/surveillance mechanisms/ && /cctv surveillance/ && /contemporary surveillance/

추신: 결국에는 출력을 다른 파일로 리디렉션하는 데 사용할 수 있습니다 >anotherfile. 또는 해당 sed -i옵션을 사용하여 동일한 검색어 패턴 파일에서 내부 변경을 수행할 수 있습니다.

그런 다음 이 스키마 파일에서 awk 형식의 스키마를 awk에 제공하면 됩니다.

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt #d1.txt = my test pattern file

또한 다음과 같이 원본 패턴 파일의 각 줄에 sed를 적용하여 원시 패턴 파일의 패턴을 변환할 수 없습니다.

while IFS= read -r line;do 
  line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line")
  awk "$line" *.txt
done <./tmp/d1.txt

또는 한 줄:

$ while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt

위 명령은 아래와 같이 테스트 파일에 올바른 AND 결과를 반환합니다.

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

결과:

$ while IFS= read -r line;do awk "$line" *.txt;done<./tmp/d1.txt
#or while IFS= read -r line;do line=$(sed 's/" "/\/ \&\& \//g; s/^"/\//g; s/"$/\//g' <<<"$line"); awk "$line" *.txt;done <./tmp/d1.txt
The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

업데이트:
위의 awk 솔루션은 일치하는 txt 파일의 내용을 인쇄합니다.
내용 대신 파일 이름을 표시하려면 필요한 경우 다음 awk를 사용하십시오.

awk "$line""{print FILENAME}" *.txt

Question 3

이 문제는 다소 어색하지만 다음과 같이 해결할 수 있습니다.

while read one two three four five six
  do grep -lF "$one $two" *files* | xargs grep -lF "$three $four" | xargs grep -lF "$five $six"
done < patterns | sort -u

이는 패턴 파일이 한 줄에 정확히 6개의 단어(3개의 패턴, 각각 2개의 단어)를 포함한다고 가정합니다. 로직은 and세 개의 연속된 필터( )를 연결하여 grep구현 됩니다. 이는 특별히 효율적이지 않습니다. 해결이 awk더 빠를 수도 있습니다.

Answer

이 문제는 다소 어색하지만 다음과 같이 해결할 수 있습니다.

while read one two three four five six
  do grep -lF "$one $two" *files* | xargs grep -lF "$three $four" | xargs grep -lF "$five $six"
done < patterns | sort -u

이는 패턴 파일이 한 줄에 정확히 6개의 단어(3개의 패턴, 각각 2개의 단어)를 포함한다고 가정합니다. 로직은 and세 개의 연속된 필터( )를 연결하여 grep구현 됩니다. 이는 특별히 효율적이지 않습니다. 해결이 awk더 빠를 수도 있습니다.

Question 4

내 테스트에서 작동하는 것으로 보이는 또 다른 방법이 있습니다.

나중에 동일한 파일(d1.txt)에 있는 문자열 파일에 대해 greping을 방지하기 위해 문자열 파일 데이터를 d1.txt라는 파일에 복사하고 별도의 디렉터리(예: tmp)로 옮겼습니다.

그런 다음 다음 명령을 사용하여 이 문자열 파일(제 경우 d1.txt)의 각 검색어 사이에 세미콜론을 삽입합니다.sed -i 's/" "/";"/g' ./tmp/d1.txt

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"
$ sed -i 's/" "/";"/g' ./tmp/d1.txt
$ cat ./tmp/d1.txt
"surveillance data";"surveillance technology";"cctv camera"
"social media";"surveillance techniques";"enforcement agencies"
"social control";"surveillance camera";"social security"
"surveillance data";"security guards";"social networking"
"surveillance mechanisms";"cctv surveillance";"contemporary surveillance"

그런 다음 명령을 사용하여 큰따옴표를 제거합니다 sed 's/"//g' ./tmp/d1.txt . 추신: 이것은 실제로 필요하지 않을 수도 있지만 테스트를 위해 큰따옴표를 제거했습니다.

$ sed -i 's/"//g' ./tmp/d1.txt && cat ./tmp/d1.txt
surveillance data;surveillance technology;cctv camera
social media;surveillance techniques;enforcement agencies
social control;surveillance camera;social security
surveillance data;security guards;social networking
surveillance mechanisms;cctv surveillance;contemporary surveillance

agrep아니요, AND 연산을 통해 다중 모드 grep을 제공하도록 설계된 이 프로그램을 사용하여 현재 디렉터리의 모든 파일을 grep할 수 있습니다 .

agrep;AND로 평가하려면 여러 패턴을 세미콜론으로 구분해야 합니다 .

내 테스트에서는 다음 내용으로 두 개의 샘플 파일을 만들었습니다.

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.

The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

현재 디렉터리에서 agrep을 실행하면 올바른 줄(AND 사용)과 파일 이름이 반환됩니다.

$ while IFS= read -r line;do agrep "$line" *;done<./tmp/d1.txt
d2.txt: The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
d3.txt: There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

Answer

내 테스트에서 작동하는 것으로 보이는 또 다른 방법이 있습니다.

나중에 동일한 파일(d1.txt)에 있는 문자열 파일에 대해 greping을 방지하기 위해 문자열 파일 데이터를 d1.txt라는 파일에 복사하고 별도의 디렉터리(예: tmp)로 옮겼습니다.

그런 다음 다음 명령을 사용하여 이 문자열 파일(제 경우 d1.txt)의 각 검색어 사이에 세미콜론을 삽입합니다.sed -i 's/" "/";"/g' ./tmp/d1.txt

$ cat ./tmp/d1.txt
"surveillance data" "surveillance technology" "cctv camera"
"social media" "surveillance techniques" "enforcement agencies"
"social control" "surveillance camera" "social security"
"surveillance data" "security guards" "social networking"
"surveillance mechanisms" "cctv surveillance" "contemporary surveillance"
$ sed -i 's/" "/";"/g' ./tmp/d1.txt
$ cat ./tmp/d1.txt
"surveillance data";"surveillance technology";"cctv camera"
"social media";"surveillance techniques";"enforcement agencies"
"social control";"surveillance camera";"social security"
"surveillance data";"security guards";"social networking"
"surveillance mechanisms";"cctv surveillance";"contemporary surveillance"

그런 다음 명령을 사용하여 큰따옴표를 제거합니다 sed 's/"//g' ./tmp/d1.txt . 추신: 이것은 실제로 필요하지 않을 수도 있지만 테스트를 위해 큰따옴표를 제거했습니다.

$ sed -i 's/"//g' ./tmp/d1.txt && cat ./tmp/d1.txt
surveillance data;surveillance technology;cctv camera
social media;surveillance techniques;enforcement agencies
social control;surveillance camera;social security
surveillance data;security guards;social networking
surveillance mechanisms;cctv surveillance;contemporary surveillance

agrep아니요, AND 연산을 통해 다중 모드 grep을 제공하도록 설계된 이 프로그램을 사용하여 현재 디렉터리의 모든 파일을 grep할 수 있습니다 .

agrep;AND로 평가하려면 여러 패턴을 세미콜론으로 구분해야 합니다 .

내 테스트에서는 다음 내용으로 두 개의 샘플 파일을 만들었습니다.

$ cat d2.txt
This guys over there have the required surveillance technology to do the job.

The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.

$ cat d3.txt
All surveillance data are locked.
All surveillance data are locked and guarded by security guards.
There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

현재 디렉터리에서 agrep을 실행하면 올바른 줄(AND 사용)과 파일 이름이 반환됩니다.

$ while IFS= read -r line;do agrep "$line" *;done<./tmp/d1.txt
d2.txt: The other guys not only have efficient surveillance technology, but they also gather surveillance data by one cctv camera.
d3.txt: There are several surveillance mechanisms (i.e cctv surveillance, contemporary surveillance, etv)

2017/01/30 수정

2017/01/30 수정

2017/01/29 수정

답변1

답변2

답변3

답변4

관련 정보