두 개의 열만 사용하여 두 파일을 비교하고 차이점을 정렬하지 않고 인쇄하는 방법은 무엇입니까?

Question 1

그리고 awk:

$ awk -v FS="\t" -v OFS="\t" 'NR==FNR {trans[$2"|"$3]++; next;} FNR==1 {print} FNR>1 {if(!trans[$2"|"$3]) print}' file2 file1

먼저 file22열과 3열의 값을 읽어서 사용하여 목록의 키로 저장합니다.
읽어들인 경우 file1헤더 행을 인쇄합니다. 다음 줄에서는 앞서 생성한 목록에 2열과 3열 값을 가진 키가 존재하는지 확인합니다. 그렇지 않은 경우 해당 행을 인쇄합니다.

Answer

그리고 awk:

$ awk -v FS="\t" -v OFS="\t" 'NR==FNR {trans[$2"|"$3]++; next;} FNR==1 {print} FNR>1 {if(!trans[$2"|"$3]) print}' file2 file1

먼저 file22열과 3열의 값을 읽어서 사용하여 목록의 키로 저장합니다.
읽어들인 경우 file1헤더 행을 인쇄합니다. 다음 줄에서는 앞서 생성한 목록에 2열과 3열 값을 가진 키가 존재하는지 확인합니다. 그렇지 않은 경우 해당 행을 인쇄합니다.

Question 2

파일을 비교하는 방식이 명확하게 설명/정의되지 않았습니다.

하지만 그렇다고 해서 내가 당신의 마음을 읽으려고 노력하는 것을 막지는 못해요...

내가 아는 한, 파일 2는 일종의 데이터베이스 파일 또는 참조입니다. 파일 1에는 새로운 데이터가 포함되어 있는 것으로 알려졌습니다.

내가 이해하는 "비교": 파일 1의 열 2 또는 3의 값이 이미 파일 2(즉, 참조)에 있는 경우 이를 인쇄/포함하지 마세요. 그렇지 않으면 인쇄/포함하세요.

좋은 소식은 요청하신 대로 정렬이 필요하지 않다는 것입니다...

다음은 2개의 매개변수를 사용하는 스크립트입니다. 첫 번째 매개변수는 새 데이터 파일(예제에서는 파일 1)입니다. 두 번째는 데이터베이스 파일입니다(예제에서는 파일 2).

#!/bin/bash

new_file=$1
db_file=$2

# Just checking the last parameter
if [ "x" = "x$db_file" ]; then
    echo >&2 "[ERROR] This scripts expect 2 file path as parameter."
    exit 1
fi

if [ ! -f $new_file ]; then
    echo >&2 "[ERROR] First parameter file doesn't exist."
    exit 2
fi

if [ ! -f $db_file ]; then
    echo >&2 "[ERROR] First parameter file doesn't exist."
    exit 3
fi


declare -A data_base

# Open both files and assign to file descriptor 10 and 11
exec 10< $new_file
exec 11< $db_file

# Step 1
# Building map of base data first (for the comparison to happen in next step)
first_line=1
while [ /bin/true ]; 
do
    read -u 11 db_file_col1 db_file_col2 db_file_col3 db_file_rest  || {
        break;
    }

    # Skipping the header so that it will appear in the diff as shown in the example
    if [  $first_line -ne 0 ]; then
        first_line=0
        continue
    fi


    # Creating map from Col 2 and Col 3 (keys) to the whole line (value)
    data_base[$db_file_col2]="$db_file_col1 $db_file_col2 $db_file_col3 $db_file_rest"
    data_base[$db_file_col3]="$db_file_col1 $db_file_col2 $db_file_col3 $db_file_rest"
done


# Step 2
# Actual comparison ... 
while [ /bin/true ]; 
do
    read -u 10 new_file_col1 new_file_col2 new_file_col3 new_file_rest  || {
        break;
    }

    if [ -z "${data_base[$new_file_col2]}" ] && [ -z "${data_base[$new_file_col3]}" ]; then
        echo "$new_file_col1 $new_file_col2 $new_file_col3 $new_file_rest"
    fi

done

예를 들어 스크립트를 process.sh라는 파일에 저장한 다음 "chmod 755 process.sh"를 사용하여 실행 가능하게 만드는 경우 다음을 수행합니다.

./process.sh file1 file2

정확한 예상 출력/결과로 이어집니다.

참고: 이 스크립트는 파일 2 내용의 두 배 이상을 메모리에 저장합니다. 메모리가 충분한지 확인하세요....

Answer