여러 파일의 특정 줄에서 2개의 문자열을 추출하고 탭으로 구분된 새 파일로 인쇄해야 합니다.

Question 1

현재 폴더의 각 파일에 대해 루프에서 sed를 사용할 수 있습니다. 관련 부분을 추출하여 다음 파일에 추가합니다 >>.file

for files in *; \
do sed -n -e '/^From file/ H;' \
          -e '/Ratio of morphemes over utterances/ {H; x; s/\n//g; s/From file <\(.*\)>.*Ratio of morphemes over utterances = \([0-9]*\.[0-9]*\).*/\1:    \2/g; p;}' "$files";
done >>file

Answer

현재 폴더의 각 파일에 대해 루프에서 sed를 사용할 수 있습니다. 관련 부분을 추출하여 다음 파일에 추가합니다 >>.file

for files in *; \
do sed -n -e '/^From file/ H;' \
          -e '/Ratio of morphemes over utterances/ {H; x; s/\n//g; s/From file <\(.*\)>.*Ratio of morphemes over utterances = \([0-9]*\.[0-9]*\).*/\1:    \2/g; p;}' "$files";
done >>file

Question 2

perl -0nE 'say "$1\t$2" if /From file <(.*?)>.*over utterances = (\d\S*)/s' * > out

Answer

perl -0nE 'say "$1\t$2" if /From file <(.*?)>.*over utterances = (\d\S*)/s' * > out

Question 3

Python에 익숙하다고 말씀하셨으므로 다음은 해당 작업을 수행할 Python 스크립트입니다.

#!/usr/bin/env python
from __future__ import print_function
import os,re,sys

def read_file(filepath):
    with open(filepath) as fd:
         for line in fd:
             clean_line = line.strip()

             if 'From file' in clean_line:

                 words = re.split('<|>| ', clean_line)
                 print(words[-2],end=" ")

             if 'Ratio of morphemes over utterances' in clean_line:
                 print(clean_line.split('=')[-1])



def find_files(treeroot):
    selfpath = os.path.abspath(__file__)
    for dir,subdirs,files in os.walk(treeroot):
         for f in files: 
             filepath = os.path.abspath(os.path.join(dir,f))
             if selfpath  ==  filepath: continue
             try:
                 read_file(filepath)
             except IOError:
                 pass
def main():
    directory = '.'
    if len(sys.argv) == 2:
       directory = sys.argv[1]
    find_files(os.path.abspath(directory))

if __name__ == '__main__': main()

실행 예시:

$ ./extract_data.py                                                                                               
adam02.cha  2.547
adam01.cha  2.213

작동 방식은 간단합니다. os.walk디렉토리를 재귀적으로 탐색하여 모든 파일을 찾고 스크립트 자체를 제외하고, 각 파일에 대해 read_file()각 파일을 한 줄씩 읽고 적절한 필드를 찾는 함수를 실행합니다. 공백과 를 단어 구분 기호로 사용하여 re.split()파일 이름 문자열을 단어 목록으로 보다 편리하게 나누는 데 사용됩니다. 스크립트는 디렉터리에 대한 명령줄 인수를 취할 수 있지만, 지정하지 않으면 현재 작업 디렉터리로 가정됩니다. 이를 통해 경로가 지정되거나 파일이 저장된 디렉터리에서 스크립트를 실행할 수 있습니다. 모든 데이터가 포함된 새 파일을 만드는 것은 간단합니다. 셸의 리디렉션을 . 추가적인 개선 사항은 파일의 for 루프를 호출하여 정렬된 방식으로 파일을 읽을 수 있다는 것입니다.<>./extract_data.py > /path/to/new_file.txtos.walk()for f in sorted(files):

Answer

Python에 익숙하다고 말씀하셨으므로 다음은 해당 작업을 수행할 Python 스크립트입니다.

#!/usr/bin/env python
from __future__ import print_function
import os,re,sys

def read_file(filepath):
    with open(filepath) as fd:
         for line in fd:
             clean_line = line.strip()

             if 'From file' in clean_line:

                 words = re.split('<|>| ', clean_line)
                 print(words[-2],end=" ")

             if 'Ratio of morphemes over utterances' in clean_line:
                 print(clean_line.split('=')[-1])



def find_files(treeroot):
    selfpath = os.path.abspath(__file__)
    for dir,subdirs,files in os.walk(treeroot):
         for f in files: 
             filepath = os.path.abspath(os.path.join(dir,f))
             if selfpath  ==  filepath: continue
             try:
                 read_file(filepath)
             except IOError:
                 pass
def main():
    directory = '.'
    if len(sys.argv) == 2:
       directory = sys.argv[1]
    find_files(os.path.abspath(directory))

if __name__ == '__main__': main()

실행 예시:

$ ./extract_data.py                                                                                               
adam02.cha  2.547
adam01.cha  2.213

작동 방식은 간단합니다. os.walk디렉토리를 재귀적으로 탐색하여 모든 파일을 찾고 스크립트 자체를 제외하고, 각 파일에 대해 read_file()각 파일을 한 줄씩 읽고 적절한 필드를 찾는 함수를 실행합니다. 공백과 를 단어 구분 기호로 사용하여 re.split()파일 이름 문자열을 단어 목록으로 보다 편리하게 나누는 데 사용됩니다. 스크립트는 디렉터리에 대한 명령줄 인수를 취할 수 있지만, 지정하지 않으면 현재 작업 디렉터리로 가정됩니다. 이를 통해 경로가 지정되거나 파일이 저장된 디렉터리에서 스크립트를 실행할 수 있습니다. 모든 데이터가 포함된 새 파일을 만드는 것은 간단합니다. 셸의 리디렉션을 . 추가적인 개선 사항은 파일의 for 루프를 호출하여 정렬된 방식으로 파일을 읽을 수 있다는 것입니다.<>./extract_data.py > /path/to/new_file.txtos.walk()for f in sorted(files):

Question 4

awk 명령을 사용해 볼 수 있습니다

awk '/Ratio of morphemes over utterances/{print FILENAME,$NF;next}' *.cha

패턴에서 파일명을 추출하고 싶다면<adam01.cha> 파일에서

그런 다음 아래 awk 명령을 사용해 보세요.

awk '/From file/{filename=$NF} filename && /Ratio of morphemes over utterances/{print FILENAME,$NF;filename="";next}' *.txt

Answer

awk 명령을 사용해 볼 수 있습니다

awk '/Ratio of morphemes over utterances/{print FILENAME,$NF;next}' *.cha

패턴에서 파일명을 추출하고 싶다면<adam01.cha> 파일에서

그런 다음 아래 awk 명령을 사용해 보세요.

awk '/From file/{filename=$NF} filename && /Ratio of morphemes over utterances/{print FILENAME,$NF;filename="";next}' *.txt

여러 파일의 특정 줄에서 2개의 문자열을 추출하고 탭으로 구분된 새 파일로 인쇄해야 합니다.

답변1

답변2

답변3

답변4

관련 정보