다른 csv 조회 테이블의 값을 사용하여 csv 파일에 열 추가

Question 1

귀하의 테이블과 조회 파일은 모두 CSV이고 동일한 구분 기호를 가지며 동일한 참조 규칙을 사용한다고 가정합니다. 그렇지 않은 경우 먼저 다른 수단을 통해 표준화해야 합니다.

또한 조회 파일이 메모리에서 읽을 수 있을 만큼 작다고 가정하겠습니다. 그렇지 않다면 데이터를 SQL로 변환해야 할 것입니다.

이러한 가정을 통해 다음을 사용할 수 있습니다 awk.

awk -F , -v OFS=, -v col=4 '
    NR == 1 { next }
    NR == FNR {
        n[$1] = $2
    }
    NR != FNR {
        NF++
        $NF = FNR == 1 ? "new" : n[$col]
        print
    }' lookup.csv Table1.csv

-F, OFS및 col위의 항목을 조정하여 표의 CSV 구분 기호 및 관련 열과 일치시킬 수 있습니다 .

Answer

귀하의 테이블과 조회 파일은 모두 CSV이고 동일한 구분 기호를 가지며 동일한 참조 규칙을 사용한다고 가정합니다. 그렇지 않은 경우 먼저 다른 수단을 통해 표준화해야 합니다.

또한 조회 파일이 메모리에서 읽을 수 있을 만큼 작다고 가정하겠습니다. 그렇지 않다면 데이터를 SQL로 변환해야 할 것입니다.

이러한 가정을 통해 다음을 사용할 수 있습니다 awk.

awk -F , -v OFS=, -v col=4 '
    NR == 1 { next }
    NR == FNR {
        n[$1] = $2
    }
    NR != FNR {
        NF++
        $NF = FNR == 1 ? "new" : n[$col]
        print
    }' lookup.csv Table1.csv

-F, OFS및 col위의 항목을 조정하여 표의 CSV 구분 기호 및 관련 열과 일치시킬 수 있습니다 .

Question 2

나는 텍스트 처리 도구가 작업에 달려 있다고 생각하지 않습니다. 대신 CSV 파일을 처리할 때 적절한 언어를 사용하는 것이 좋습니다.

다음은 R의 제안입니다(http://r-project.org, 모르면 효과적으로 Google을 검색하기가 어렵습니다.)

#!/usr/bin/Rscript
args <- commandArgs(TRUE)

# Here, we read each table passed as argument on the commandline
tablenames <- list()
for (tablename in args) {
    header <- readLines(tablename, n=1)
    # we try to detect the separator (the character that surrounds "Transaction#")
    # That doesn't work if you use multi-characters separators
    sep <- sub(".*(.)Transaction#.*","\\1",header)
    if (nchar(sep[1]) != 1) {
        sep <- sub(".*Transaction#(.).*","\\1",header)
    }
    if (nchar(sep[1]) != 1) {
        print(paste0("Could not detect separator around column 'Transaction#' in file ",tablename))
    } else {
        # each table where the separator is succesfully detected
        # is added to a list of tablenames
        tablenames[[tablename]] <- list(name=tablename,sep=sep)
    }
}

# we parse each table in the list of tablenames
tables <- lapply(tablenames, function(tab) { read.csv(tab$name, check.names=FALSE, sep=tab$sep) })

# we also parse the lookup table, which has a different format
lookup <- read.table("lookup",header=TRUE,check.names=FALSE,comment.char="")

# then for each table, we add the new column
for (i in 1:length(tablenames)) {
  # This line magic:
  # - it adds a new column called "New#" to the table
  # - this column is populated from table lookup
  # - lines in lookup are filtered and ordered so that column "Transaction#" matches columns "Transaction#" in the table
  # - we add only column "New#" from lookup to the table
  tables[[i]][,"New#"] <- lookup[match(tables[[i]][,"Transaction#"],lookup[,"Transaction#"]),"New#"]

  # we write back the table under the name "new <original name>"
  write.table(tables[[i]], file=paste("new",tablenames[[i]]$name), sep=tablenames[[i]]$sep, quote=FALSE, row.names=FALSE)
}

테이블이 있는 디렉터리에서 이 스크립트를 호출해야 합니다.

./script table1 table2 ...

여기서 table1, table2,...는 테이블의 파일 이름입니다. 스크립트를 작성할 때 조회 테이블은 파일에 있어야 lookup하지만 이는 쉽게 변경할 수 있습니다.

예를 들어:

1 번 테이블

field1,field2,ffield1,field2,field3,Transaction#,field4
 ABC,ABC,ABC,1,CFG
 ABC,ABC,ABC,3,CFG

표 2

field1;Transaction#;field3;field4;field5
ABC;2;ABC;ABC;CFG
ABC;1;ABC;ABC;CFG
ABC;3;ABC;ABC;CFG

우리는 달린다 ./script.R table1 table2.

찾다

Transaction#   New#
    1            122
    2            123
    3            124

결과 :

새 테이블 1

field1,field2,field3,Transaction#,field4,New#
 ABC,ABC,ABC,1,CFG,122
 ABC,ABC,ABC,3,CFG,124

새 테이블 2

field1;Transaction#;field3;field4;field5;New#
ABC;2;ABC;ABC;CFG;123
ABC;1;ABC;ABC;CFG;122
ABC;3;ABC;ABC;CFG;124

Answer