헤더가 동일한 여러 CSV 파일을 여러 그룹 파일로 병합

Question

중복 헤더는 인접한 파일에서만 발견됩니까? 이 답변을 통해 이것이 사실이 아닌 경우 처리하고 싶습니다.

#!/bin/bash

# Declare header_list[] to be an associative array
declare -A header_list

# Read the first line from every *.csv file in $1
# Each filename is added to the appropriate entry in header_list[]
for f in "${1:?}"/*.csv; do
  echo "### Reading header from $f"
  header_list[$(head -1 "$f")]+="${IFS}$f"
done

# Handle the list of files for each entry in header_list[]
group_id=1
for key in "${!header_list[@]}"; do
  value="${header_list[$key]}"
  groupfile="${2:?}/GROUP-${group_id}.csv"
  echo "### Header: ${key}"
  echo "### Group File: ${groupfile}"
  
  # Echo the header as the first line of $groupfile
  echo "${key}" > "${groupfile}"
  
  # Skip the first line, but echo every other line from each file with this header
  for file in ${value}; do
  echo "# File: ${file}"
    tail --lines=+2 "$file" >> "${groupfile}"
  done
  
  # Increment group_id
  (( group_id++ ))
done

파일에 저장하고 소스 파일이 포함된 디렉터리와 출력 디렉터리라는 두 가지 매개변수를 사용하여 실행합니다.

몇 가지 참고사항:

출력 디렉터리가 있어야 합니다.
$IFS에 문자가 포함된 파일 이름은 올바르게 처리되지 않습니다.

Answer 1

중복 헤더는 인접한 파일에서만 발견됩니까? 이 답변을 통해 이것이 사실이 아닌 경우 처리하고 싶습니다.

#!/bin/bash

# Declare header_list[] to be an associative array
declare -A header_list

# Read the first line from every *.csv file in $1
# Each filename is added to the appropriate entry in header_list[]
for f in "${1:?}"/*.csv; do
  echo "### Reading header from $f"
  header_list[$(head -1 "$f")]+="${IFS}$f"
done

# Handle the list of files for each entry in header_list[]
group_id=1
for key in "${!header_list[@]}"; do
  value="${header_list[$key]}"
  groupfile="${2:?}/GROUP-${group_id}.csv"
  echo "### Header: ${key}"
  echo "### Group File: ${groupfile}"
  
  # Echo the header as the first line of $groupfile
  echo "${key}" > "${groupfile}"
  
  # Skip the first line, but echo every other line from each file with this header
  for file in ${value}; do
  echo "# File: ${file}"
    tail --lines=+2 "$file" >> "${groupfile}"
  done
  
  # Increment group_id
  (( group_id++ ))
done

파일에 저장하고 소스 파일이 포함된 디렉터리와 출력 디렉터리라는 두 가지 매개변수를 사용하여 실행합니다.

몇 가지 참고사항:

출력 디렉터리가 있어야 합니다.
$IFS에 문자가 포함된 파일 이름은 올바르게 처리되지 않습니다.

헤더가 동일한 여러 CSV 파일을 여러 그룹 파일로 병합

답변1

관련 정보