단일 키 열을 기반으로 두 파일을 병합하고 고정 열을 자동으로 수정하고 누락된 데이터를 채우는 방법

Question

이제 날짜/시간 계산은 항상...어려운 일이었습니다. 특히. 날짜 시계열이 자정, 월말 또는 연도말 또는 일광 절약 시간제로 전환되는 경우. 여기서는 안전을 위해 epoch 초를 사용합니다. 이 명령을 사용하여 날짜/시간으로 다시 변환하면 date모든 *nix 버전에서 작동하지 않을 수 있습니다. 또한 TZDST 문제를 방지하기 위해 변수를 "UTC"로 설정했습니다. 없이 시도해 보면 알게 될 것입니다. 한 번 시도해 보세요.

export TZ=UTC                                                       # get rid of side effects, e.g. DST switching

cut -d, -f8 samplefile | date -f- +%s | paste - samplefile > TMP1   # prepend epoch seconds to the input file

{ read MIN DUMMY                                                    # get file´s MIN and MAX dates
  while read TMP DUMMY           
     do MAX=$TMP
     done                                                           # and calculate a sequence of days between them
     eval echo @{$MIN..$MAX..86400} | tr ' ' $'\n' | date -f- +$'%s\t%Y-%m-%d\t%y%j'
} < TMP1 > TMP2                                                     # in epoch, yyyy-mm-dd, and julian format

join -a1 -a2 -- TMP1 TMP2 | awk -F"[, ]" '                          # join first and second intermediate files
NF == 3         {split($0, TMPINS)                                  # orig. files line missing; fill temp array with epoch etc. data
                 $0  = SAVED                                        # get last saved complete line
                 $9  = TMPINS[2]                                    # overwrite "yesterday´s" date 
                 $NF = TMPINS[3]                                    # append julian date
                 $11 = $12 = $13 = $14 = -999                       # set invalid indicator
                }
NF >= 13        {SAVED = $0                                         # correct line? save it
                 $1 = $1                                            # recreate line with OFS char
                }

                {sub($1",",_)                                       # for all lines: remove leading epoch field
                 $14 = $15                                          # put julian date into right place
                 NF--                                               # get rid of last field; may not work in ALL awks
                }
1                                                                   # default action: print
' OFS=","
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-26,Parts-per-million,24,100.0,0.379167,10,20026
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-27,Parts-per-million,24,100.0,0.2875,10,20027
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-28,Parts-per-million,11,46.0,0.163636,10,20028
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-29,Parts-per-million,-999,-999,-999,-999,20029
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-30,Parts-per-million,20,83.0,0.23,10,20030

FIFO의 도움으로 긴 명령 파이프라인에 모든 것을 작성할 수 있습니다.

mkfifo TMPFIFO
cut -d, -f8 samplefile | date -f- +%s | tee -a >(read MIN; while read TMP; do MAX=$TMP; done; eval echo @{$MIN..$MAX..86400} | tr ' ' $'\n' > TMPFIFO) | paste - samplefile | join -a1 -a2 -- - <(date -fTMPFIFO +$'%s\t%Y-%m-%d\t%y%j') | awk -F"[, ]" 'NF == 3 {split($0, TMPINS); $0 = SAVED; $9 = TMPINS[2]; $NF = TMPINS[3]; $11 = $12 = $13 = $14 = -999} NF >= 13 {SAVED = $0; $1 = $1} {sub($1",",_); $14 = $15; NF--} 1' OFS=","

Answer 1

이제 날짜/시간 계산은 항상...어려운 일이었습니다. 특히. 날짜 시계열이 자정, 월말 또는 연도말 또는 일광 절약 시간제로 전환되는 경우. 여기서는 안전을 위해 epoch 초를 사용합니다. 이 명령을 사용하여 날짜/시간으로 다시 변환하면 date모든 *nix 버전에서 작동하지 않을 수 있습니다. 또한 TZDST 문제를 방지하기 위해 변수를 "UTC"로 설정했습니다. 없이 시도해 보면 알게 될 것입니다. 한 번 시도해 보세요.

export TZ=UTC                                                       # get rid of side effects, e.g. DST switching

cut -d, -f8 samplefile | date -f- +%s | paste - samplefile > TMP1   # prepend epoch seconds to the input file

{ read MIN DUMMY                                                    # get file´s MIN and MAX dates
  while read TMP DUMMY           
     do MAX=$TMP
     done                                                           # and calculate a sequence of days between them
     eval echo @{$MIN..$MAX..86400} | tr ' ' $'\n' | date -f- +$'%s\t%Y-%m-%d\t%y%j'
} < TMP1 > TMP2                                                     # in epoch, yyyy-mm-dd, and julian format

join -a1 -a2 -- TMP1 TMP2 | awk -F"[, ]" '                          # join first and second intermediate files
NF == 3         {split($0, TMPINS)                                  # orig. files line missing; fill temp array with epoch etc. data
                 $0  = SAVED                                        # get last saved complete line
                 $9  = TMPINS[2]                                    # overwrite "yesterday´s" date 
                 $NF = TMPINS[3]                                    # append julian date
                 $11 = $12 = $13 = $14 = -999                       # set invalid indicator
                }
NF >= 13        {SAVED = $0                                         # correct line? save it
                 $1 = $1                                            # recreate line with OFS char
                }

                {sub($1",",_)                                       # for all lines: remove leading epoch field
                 $14 = $15                                          # put julian date into right place
                 NF--                                               # get rid of last field; may not work in ALL awks
                }
1                                                                   # default action: print
' OFS=","
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-26,Parts-per-million,24,100.0,0.379167,10,20026
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-27,Parts-per-million,24,100.0,0.2875,10,20027
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-28,Parts-per-million,11,46.0,0.163636,10,20028
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-29,Parts-per-million,-999,-999,-999,-999,20029
06,037,0016,42101,34.14435,-117.85036,1-HOUR,2020-01-30,Parts-per-million,20,83.0,0.23,10,20030

FIFO의 도움으로 긴 명령 파이프라인에 모든 것을 작성할 수 있습니다.

mkfifo TMPFIFO
cut -d, -f8 samplefile | date -f- +%s | tee -a >(read MIN; while read TMP; do MAX=$TMP; done; eval echo @{$MIN..$MAX..86400} | tr ' ' $'\n' > TMPFIFO) | paste - samplefile | join -a1 -a2 -- - <(date -fTMPFIFO +$'%s\t%Y-%m-%d\t%y%j') | awk -F"[, ]" 'NF == 3 {split($0, TMPINS); $0 = SAVED; $9 = TMPINS[2]; $NF = TMPINS[3]; $11 = $12 = $13 = $14 = -999} NF >= 13 {SAVED = $0; $1 = $1} {sub($1",",_); $14 = $15; NF--} 1' OFS=","

단일 키 열을 기반으로 두 파일을 병합하고 고정 열을 자동으로 수정하고 누락된 데이터를 채우는 방법

답변1

관련 정보