중첩된 참조가 있는 csv에서 tsv로

Question 1

그리고 mlr:

mlr -N --icsv --otsvlite cat < file.csv > file.tsv

또는:

mlr -N --c2t --quote-none cat < file.csv > file.tsv

그러나 csv 필드에 탭 문자가 포함되어 있으면 출력에서 이스케이프 처리되지 않으므로 추가 필드가 발생합니다.

GNU를 사용하면 sed같은 일을 할 수 있습니다:

sed -E '
  # append next line as long as there is not an even number
  # of "s, to handle fields with newline. You can omit this line
  # if the fields are guaranteed not to contain newlines:
  :1; /^([^"]*"[^"]*")*[^"]*$/! {N;b1}

  s/$/,/
  s/(([^,"]*)|"((""|[^"])*)"),/\2\3\t/g
  s/\t$//
  s/""/"/g' < file.csv > file.tsv

입력은 현재 로케일의 유효한 텍스트로 간주됩니다. 먼저 현지화를 sed비활성화 LC_ALL=C sed...하고 입력을 바이너리로 처리하여 디코딩 문제를 방지합니다(속도가 문제가 되는 경우 속도가 빨라질 수 있음).

Answer

그리고 mlr:

mlr -N --icsv --otsvlite cat < file.csv > file.tsv

또는:

mlr -N --c2t --quote-none cat < file.csv > file.tsv

그러나 csv 필드에 탭 문자가 포함되어 있으면 출력에서 이스케이프 처리되지 않으므로 추가 필드가 발생합니다.

GNU를 사용하면 sed같은 일을 할 수 있습니다:

sed -E '
  # append next line as long as there is not an even number
  # of "s, to handle fields with newline. You can omit this line
  # if the fields are guaranteed not to contain newlines:
  :1; /^([^"]*"[^"]*")*[^"]*$/! {N;b1}

  s/$/,/
  s/(([^,"]*)|"((""|[^"])*)"),/\2\3\t/g
  s/\t$//
  s/""/"/g' < file.csv > file.tsv

입력은 현재 로케일의 유효한 텍스트로 간주됩니다. 먼저 현지화를 sed비활성화 LC_ALL=C sed...하고 입력을 바이너리로 처리하여 디코딩 문제를 방지합니다(속도가 문제가 되는 경우 속도가 빨라질 수 있음).

Question 2

로드 가능한 CSV 모듈이 포함된 bash 5.1

BASH_LOADABLES_PATH=${BASH/\/bin\//\/lib\/}
enable -f csv csv
csv -a fields "$line"
new_line=$(IFS=$'\t'; echo "${fields[*]}")
declare -p line fields new_line

산출

declare -- line="a,\"test, part2 \"\"the start\"\"\",b"
declare -a fields=([0]="a" [1]="test, part2 \"the start\"" [2]="b")
declare -- new_line="a  test, part2 \"the start\"   b"
#.....................^ tab......................^ tab

탭이 포함된 필드가 있는 경우에는 아무런 효과가 없습니다.

파이프라인에서:

IFS=$'\t'
cat file |
while IFS= read -r line; do
    csv -a fields "$line"
    echo "${fields[*]}"
done |
tail

이것은 좀 더 관용적인 bash이지만

IFS=$'\t'
while IFS= read -r line; do
    csv -a fields "$line"
    echo "${fields[*]}"
done < file | tail

Answer