거대한 로그, 더 정확하게는 열을 추적해야 합니다. 이 열에는 103부터 17431까지의 정수 값이 있습니다. 원본 파일 예시
402
402
402
667
942
342
990
402
각 숫자에 대해 0에서 9까지의 색인 값을 할당해야 합니다. 관심 있는 열을 별도의 파일로 분리한 다음 각 행을 확인하고 찾은 숫자를 특정 인덱스로 바꿀 생각입니다. 최종 출력은 다음과 유사합니다.
3
3
5
9
7
8
3
내가 시도한 해결책은 AWK
실패했습니다. 내 코드:
csvtool col 2 /my/path/to/list.csv >tmp
awk '($0>=363 && $0<=499) || ($0>=4645 && $0<=4646) {$0="0"}1' tmp
awk '($0>=2174 && $0<=2193) {$0="1"}1' tmp
awk '($0=500) || ($0>=12308 && $0<=12356) {$0="2"}1' tmp
awk '($0>=103 && $0<=220) || ($0>=252 && $0<=299) || ($0>=1980 && $0<=1986) || ($0>=2921 && $0<=2922) {$0="3"}1' tmp
awk '($0>=221 && $0<=251) || ($0>=8085 && $0<=8091) || ($0=8350) || ($0>=12809 && $0<=12945) || ($0>=16834 && $0<=17033) {$0="4"}1' tmp
awk '($0>=300 && $0<=362) || ($0=522) || ($0>=2923 && $0<=2925) || ($0>=3441 && $0<=3442) || ($0=4644)|| ($0>=5677 && $0<=5695) || ($0>=8082 && $0<=8083)|| ($0>=8093 && $0<=8349) || ($0>=12946 && $0<=12947) || ($0>=21986 && $0<=13215) || ($0>=13309 && $0<=13311) {$0="5"}1' tmp
awk '($0>=501 && $0<=504) || ($0>=566 && $0<=600) || ($0>=613 && $0<=637) || ($0>=2015 && $0<=2040) || ($0>=2103 && $0<=2126) || ($0>=2373 && $0<=2374) || ($0>=3828 && $0<=4125) || ($0>=4237 && $0<=4636) || ($0>=4647 && $0<=4889) || ($0>=4991 && $0<=5676) || ($0>=5696 && $0<=5705) || ($0>=6502 && $0<=6595) || ($0>=8429 && $0<=8460) || ($0>=8552 && $0<=8699) || ($0>=10487 && $0<=10977) || ($0>=11326 && $0<=11617) || ($0>=11688 && $0<=11815) || ($0>=11844 && $0<=11938) || ($0>=12490 && $0<=12597) || ($0>=12973 && $0<=12982) || ($0>=13367 && $0<=13414) {$0="6"}1' tmp
awk '($0>=523 && $0<=548) || ($0>=555 && $0<=565) || ($0>=2005 && $0<=2014) || ($0>=2041 && $0<=2063) || ($0>=2091 && $0<=2102) || ($0=2394) || ($0>=2407 && $0<=2411) || ($0>=2926 && $0<=3008) || ($0>=3443 && $0<=3473) || ($0>=3486 && $0<=3813) || ($0>=4132 && $0<=4144) || ($0>=4637 && $0<=4643) || ($0>=4916 && $0<=4981) || ($0>=5711 && $0<=5741) || ($0>=6403 && $0<=6405) || ($0>=6415 && $0<=6466) || ($0>=6701 && $0<=7002) || ($0>=7035 && $0<=7048) || ($0>=8426 && $0<=8428) || ($0>=8496 && $0<=8541) || ($0>=8857 && $0<=9323) || ($0>=9429 && $0<=9618) || ($0>=9674 && $0<=9789) || ($0>=9802 && $0<=9811) || ($0>=9850 && $0<=10009) || ($0>=10131 && $0<=10136) || ($0>=10396 && $0<=10402) || ($0>=11000 && $0<=11175) || ($0=11618) || ($0>=12100 && $0<=12111) || ($0>=12212 && $0<=12219) || ($0=12489) || ($0>=12807 && $0<=12808) || ($0=12983) || ($0>=14616 && $0<=14627) || ($0>=15723 && $0<=15897) {$0="7"}1' tmp
awk '($0=521) || ($0=554) || ($0>=601 && $0<=612) || ($0>=651 && $0<=708) || ($0>=1905 && $0<=1942) || ($0>=1949 && $0<=1979) || ($0>=1987 && $0<=1993) || ($0>=2259 && $0<=2278) || ($0>=2352 && $0<=2362) || ($0>=2395 && $0<=2406) || ($0>=2412 && $0<=2449) || ($0>=2673 && $0<=2919) || ($0>=3009 && $0<=3016) || ($0>=3814 && $0<=3827) || ($0>=4126 && $0<=4131) || ($0>=4982 && $0<=4990) || ($0>=5706 && $0<=5710) || ($0>=6012 && $0<=6181) || ($0>=6285 && $0<=6339) || ($0>=6409 && $0<=6411) || ($0>=6596 && $0<=6700) || ($0>=7191 && $0<=7424) || ($0=8081) || ($0>=8550 && $0<=8551) || ($0>=8700 && $0<=8716) || ($0>=9324 && $0<=9326) || ($0>=9619 && $0<=9624) || ($0=9729) || ($0>=10018 && $0<=10064) || ($0>=10115 && $0<=10126) || ($0>=10198 && $0<=10386) || ($0=10486) || ($0>=12112 && $0<=12115) || ($0>=12209 && $0<=12211) {$0="8"}1' tmp
awk '($0>=489 && $0<=498) || ($0>=505 && $0<=520) || ($0>=549 && $0<=553) || ($0>=638 && $0<=650) || ($0>=709 && $0<=1904) || ($0>=1943 && $0<=1948) || ($0>=1994 && $0<=2004) || ($0>=2064 && $0<=2090) || ($0>=2127 && $0<=2173) || ($0>=2194 && $0<=2258) || ($0>=2279 && $0<=2351) || ($0>=2363 && $0<=2372) || ($0=2393) || ($0>=2450 && $0<=2672) || ($0>=3474 && $0<=3485) || ($0>=4145 && $0<=4236) || ($0>=4890 && $0<=4915) || ($0>=5742 && $0<=6011) || ($0>=7003 && $0<=7034) || ($0>=7049 && $0<=7295) || ($0>=7425 && $0<=8080) || ($0=8084) || ($0>=8352 && $0<=8425) || ($0>=8461 && $0<=8495) || ($0>=8542 && $0<=8549) || ($0>=8717 && $0<=8856) || ($0>=9327 && $0<=9428) || ($0>=9625 && $0<=9673) || ($0>=9790 && $0<=9791) || ($0>=9793 && $0<=9801) || ($0>=9812 && $0<=9849) || ($0>=10010 && $0<=10017) || ($0>=10065 && $0<=10114) || ($0>=10128 && $0<=10130) || ($0>=10137 && $0<=10197) || ($0>=10387 && $0<=10395) || ($0>=10403 && $0<=10485) || ($0>=10978 && $0<=10999) || ($0>=11176 && $0<=11325) || ($0>=11620 && $0<=11687) || ($0>=11816 && $0<=11843) || ($0>=11939 && $0<=12099) || ($0>=12116 && $0<=12208) || ($0>=12220 && $0<=12307) || ($0>=12357 && $0<=12488) || ($0>=12598 && $0<=12806) || ($0>=12948 && $0<=12972) || ($0>=13216 && $0<=13306) || ($0>=13312 && $0<=13366) || ($0>=13415 && $0<=14615) || ($0>=14628 && $0<=15722) || ($0>=15989 && $0<=16833) || ($0>=17402 && $0<=17431) {$0="9"}1' tmp
불행히도 위 코드는 다음을 생성합니다.
9
9
9
9
9
9
9
작동시키는 방법에 대한 아이디어가 있습니까? 다른 방법이 있나요? 감사해요.
답변1
$0
값과 비교 하려면 ==
and not 을 사용하세요 =
. =
새 값을 할당합니다 $0
. 새 값을 할당하면 awk는 $0=2393
표현식(예를 들어)을 true로 평가한 다음 이를 인쇄합니다 9
.
답변2
perl -pi -e 's/(^[^,]*,\d)\d+,/$1,/g' a.csv
첫 번째 숫자로 분류합니다.