csv 파일을 준비하기 위한 awk 스크립트

csv 파일을 준비하기 위한 awk 스크립트

나는 csv 파일을 분석하기 전에 준비하기 위해 awk 스크립트를 만들어 왔습니다. 1-2, 10, 13-15, 19-21 열을 포함하는 출력 파일을 생성해야 합니다. 또한 2열의 숫자를 요일(1 = 월요일, 2 = 화요일...)로 바꾸고 21열을 해리에서 킬로미터로 변환하고 ""10, 13, 14열을 삭제해야 합니다. .

입력하다:

"DAY_OF_MONTH","DAY_OF_WEEK","OP_UNIQUE_CARRIER","OP_CARRIER_AIRLINE_ID","OP_CARRIER","TAIL_NUM","OP_CARRIER_FL_NUM","ORIGIN_AIRPORT_ID","ORIGIN_AIRPORT_SEQ_ID","ORIGIN","DEST_AIRPORT_ID","DEST_AIRPORT_SEQ_ID","DEST","DEP_TIME","DEP_DEL15","DEP_TIME_BLK","ARR_TIME","ARR_DEL15","CANCELLED","DIVERTED","DISTANCE",
1,2,"EV",20366,"EV","N48901","4397",13930,1393007,"ORD",11977,1197705,"GRB","1003",0.00,"1000-1059","1117",0.00,0.00,0.00,174.00,
1,2,"EV",20366,"EV","N16976","4401",15370,1537002,"TUL",13930,1393007,"ORD","1027",0.00,"1000-1059","1216",0.00,0.00,0.00,585.00,
1,2,"EV",20366,"EV","N12167","4404",11618,1161802,"EWR",15412,1541205,"TYS","1848",0.00,"1800-1859","2120",0.00,0.00,0.00,631.00,

산출:

"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

지금까지 필요한 열을 가져오는 명령이 있습니다.

cut -d "," -f1-2,10,13-15,19-21 'Jan_2020_ontime.csv' > 'flights_jan_20.csv'

2열의 숫자를 해당 요일로 바꾸는 코드도 있습니다.

awk 'BEGIN {FS = OFS = ","} 
     $2 == 1 {$2 = "Monday"} 
     $2 == 2 {$2 = "Tuesday"} 
     $2 == 3 {$2 = "Wednesday"} 
     $2 == 4 {$2 = "Thursday"} 
     $2 == 5 {$2 = "Friday"} 
     $2 == 6 {$2 = "Saturday"} 
     $2 == 7 {$2 = "Sunday"} 
     {print}' 
file.csv

또한 나중에 실행하기 위해 모든 코드를 스크립트로 래핑하는 방법도 없습니다.

답변1

awk '
    BEGIN {
        split("Monday Tuesday Wednesday Thursday Friday Saturday Sunday",days)
        FS=OFS=","
    }
    NR > 1 {
        gsub(/"/,"")
        $2 = days[$2]
        $21 *= 1.852
    }
    { print $1, $2, $10, $13, $14, $15, $19, $20, $21 }
' file
"DAY_OF_MONTH","DAY_OF_WEEK","ORIGIN","DEST","DEP_TIME","DEP_DEL15","CANCELLED","DIVERTED","DISTANCE"
1,Tuesday,ORD,GRB,1003,0.00,0.00,0.00,322.248
1,Tuesday,TUL,ORD,1027,0.00,0.00,0.00,1083.42
1,Tuesday,EWR,TYS,1848,0.00,0.00,0.00,1168.61

답변2

#!/bin/awk -f
BEGIN {
    dow[1] = "Monday"
    dow[2] = "Tuesday"
    dow[3] = "Wednesday"
    dow[4] = "Thursday"
    dow[5] = "Friday"
    dow[6] = "Saturday"
    dow[7] = "Sunday"

    FS=OFS=","
}

NR == 1 {print $1, $2, $10, $13, $14, $15, $19, $20, $21}

NR != 1 {
    $2 = dow[$2]
    $21 *= 1.852
    gsub(/"/, "", $10)
    gsub(/"/, "", $13)
    gsub(/"/, "", $14)
    print $1, $2, $10, $13, $14, $15, $19, $20, $21
}

다음 과 같은 파일에 저장합니다. sample.awk.실행 가능하게 만들고 .chmod +x sample.awk./sample.awk data

출력을 다른 파일에 저장하려면 다음과 같이 출력 리디렉션 연산자를 추가하세요../sample.awk data > out.csv

관련 정보