awk/sed 쉘 스크립트 도움말

Question 1

~을 위한GNU awk

awk -F, '
    NR>1{
        sub("..","")                   #remove first two letters (mean IE)
        d=""
        for(i=split($2,D,"/");i>0;i--) #format 2nd field into `YY MM DD` 
            d=d D[i] " "
        print strftime("%b %Y",mktime(d 0" "0" "0)),gensub("[0-9]"," & ",8,$1)
    }' file

mktimeEPOCH 형식의 문자열에서 타임스탬프(초 단위) 생성YYYY MM DD HH MM SS
strftime타임스탬프를 필요한 형식으로 변환합니다(인 경우 %b %Y).
gensub첫 번째 필드의 숫자( )를 후행 공백 8( ) 자체( ) 로 바꿉니다.[0-9]$1&

우리는 문자열 형식만 볼 수 있으므로 다음을 사용할 수 있습니다.sed:

sed -r '
    1d
    s/./ & /10
    s|(../)(../)|\2\1|
    s/..([^,]*),([^,]*).*/date -d "\2" +"%b %Y \1"/e
    ' file

또는sed아니요이자형주문하다

sed '
    1d
    s/./ & /10
    s|\(../\)\(../\)|\2\1|
    s/..\([^,]*\),\([^,]*\).*/date -d "\2" +"%b %Y \1"/
    ' file | bash

또는

sed '
    s/./ & /10
    s/../+"%b %Y /
    s/,/" -d /
    s|\(../\)\(../\)|\2\1|
    s/,/\n/
    1!P
    d' file | xargs -n3 date

Answer

~을 위한GNU awk

awk -F, '
    NR>1{
        sub("..","")                   #remove first two letters (mean IE)
        d=""
        for(i=split($2,D,"/");i>0;i--) #format 2nd field into `YY MM DD` 
            d=d D[i] " "
        print strftime("%b %Y",mktime(d 0" "0" "0)),gensub("[0-9]"," & ",8,$1)
    }' file

mktimeEPOCH 형식의 문자열에서 타임스탬프(초 단위) 생성YYYY MM DD HH MM SS
strftime타임스탬프를 필요한 형식으로 변환합니다(인 경우 %b %Y).
gensub첫 번째 필드의 숫자( )를 후행 공백 8( ) 자체( ) 로 바꿉니다.[0-9]$1&

우리는 문자열 형식만 볼 수 있으므로 다음을 사용할 수 있습니다.sed:

sed -r '
    1d
    s/./ & /10
    s|(../)(../)|\2\1|
    s/..([^,]*),([^,]*).*/date -d "\2" +"%b %Y \1"/e
    ' file

또는sed아니요이자형주문하다

sed '
    1d
    s/./ & /10
    s|\(../\)\(../\)|\2\1|
    s/..\([^,]*\),\([^,]*\).*/date -d "\2" +"%b %Y \1"/
    ' file | bash

또는

sed '
    s/./ & /10
    s/../+"%b %Y /
    s/,/" -d /
    s|\(../\)\(../\)|\2\1|
    s/,/\n/
    1!P
    d' file | xargs -n3 date

Question 2

나는 "펄을 사용"하고 싶습니다:

#!/usr/bin/env perl 
use strict;
use warnings;

use Time::Piece;

#get the column names out of the file. We remove the trailing linefeed. 
#<> is the magic input file handle, so it reads from STDIN or files
#specified on command line, e.g. myscript.pl file_to_process.csv
my @headers = split ( /,/, <> =~ s/\n//r );

while ( <> ) { 
    chomp; #strip linefeed. 
    my %stuff;
    #this makes use of the fact we know the headers already
    #so we can map from the line into named columns. 
    @stuff{@headers} = split /,/; #read comma sep into hash

    #DOB:
    #take date, parse it into a unix time, then use strftime to output "Mon year"
    print Time::Piece -> strptime ( $stuff{'DOB'}, "%d/%m/%Y" ) -> strftime("%b %Y");
    #regex match against AnimalNumber, and then join it with space separation. 
    print "\t"; #separator
    print join ( " ", $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ );
    print "\n";
}

이 출력은 다음과 같습니다.

Feb 2010    1612892 4 0602
Jan 2009    1414244 9 0333
Jan 2007    1514244 2 0395

이는 다음을 통해 달성됩니다.

매직 파일 핸들 읽기 <>- 파이프 또는 파일 이름에서 입력을 가져옵니다.
첫 번째 줄을 읽고 이를 @headers.
각 추가 행을 반복하고 쉼표로 구분된 값을 해시(라고 함)에 매핑합니다 %stuff.
DOB- 에서 추출 하고 필요한 경우 날짜로 %stuff처리합니다 .strptime/strftime
정규식 패턴을 추출 AnimalNumber하고 %stuff사용하여 원하는 숫자를 추출하십시오.
여러 캡처 그룹을 사용하고 있기 때문에 캡처된 요소는 목록으로 반환되며, 그런 다음 공백 구분 기호를 사용하여 함께 붙일 수 있습니다 join.

편집: 정렬을 고려하고 있으므로 먼저 전체 데이터를 메모리로 읽어야 합니다(효율성 이유로 위에서는 수행되지 않음).

하지만:

#!/usr/bin/env perl 
use strict;
use warnings;

use Data::Dumper;
use Time::Piece;

my @headers = split( /,/, <> =~ s/\n//r );

my @records;

while (<>) {
    chomp;    #strip linefeed.
    my %stuff;

    #this makes use of the fact we know the headers already
    #so we can map from the line into named columns.
    @stuff{@headers} = split /,/;    #read comma sep into hash

 #DOB:
 #take date, parse it into a unix time, then use strftime to output "Mon year"
    $stuff{'formtime'} =
        Time::Piece->strptime( $stuff{'DOB'}, "%d/%m/%Y" )->strftime("%b %Y");

    #regex match against AnimalNumber, and then join it with space separation.
    #separator
    $stuff{'number_arr'} = [ $stuff{'AnimalNumber'} =~ m/(\d+)(\d)(\d{4})/ ];

    push( @records, \%stuff );
}

foreach
    my $record ( sort { $b->{'number_arr'}->[2] <=> $a->{'number_arr'}->[2] }
    @records )
{
    print join( "\t",
        $record->{'formtime'}, join( " ", @{ $record->{'number_arr'} } ),
        ),
        "\n";
}

위와 유사하지만 각 레코드를 해시 배열로 전처리한 다음 sort인쇄하기 전에 "키" 필드를 기반으로 출력의 마지막 4자리 세트를 사용합니다 number_arr.

Answer