두 디렉터리에서 가장 일치하는 파일을 찾습니다.

Question

다음과 같이 작동합니다.

for fa in A/*; do

    highest_pm=0

    for fb in B/*; do

    num_identical_lines=$(diff --unchanged-group-format='%<' --old-group-format='' --new-group-format='' --changed-group-format='' "$fa" "$fb" | wc -l)
    num_lines_file_a=$(wc -l < "$fa")

    # save permille of matching lines
    pm=$((1000*num_identical_lines/num_lines_file_a))

    # compare with highest permille
    if [ $pm -gt $highest_pm ]; then
        highest_pm=$pm
        best_match="$fb"
    fi

    done

    # output
    [ $highest_pm -gt 0 ] \
    && printf "File %s best matches File %s with %d %% of identical lines.\n" "$fa" "$best_match" $((highest_pm/10)) \
    || printf "File %s has no match\n" "$fa"

done

num_identical_lines는 다음과 같이 평가됩니다.이 답변을 바탕으로.
남은 것은 파일 루프, 일부 비교 및 일부 출력뿐입니다 ;-)

산출:

File A/file2 has no match
File A/filea best matches File B/fileb with 50 % of identical lines.

Answer 1

다음과 같이 작동합니다.

for fa in A/*; do

    highest_pm=0

    for fb in B/*; do

    num_identical_lines=$(diff --unchanged-group-format='%<' --old-group-format='' --new-group-format='' --changed-group-format='' "$fa" "$fb" | wc -l)
    num_lines_file_a=$(wc -l < "$fa")

    # save permille of matching lines
    pm=$((1000*num_identical_lines/num_lines_file_a))

    # compare with highest permille
    if [ $pm -gt $highest_pm ]; then
        highest_pm=$pm
        best_match="$fb"
    fi

    done

    # output
    [ $highest_pm -gt 0 ] \
    && printf "File %s best matches File %s with %d %% of identical lines.\n" "$fa" "$best_match" $((highest_pm/10)) \
    || printf "File %s has no match\n" "$fa"

done

num_identical_lines는 다음과 같이 평가됩니다.이 답변을 바탕으로.
남은 것은 파일 루프, 일부 비교 및 일부 출력뿐입니다 ;-)

산출:

File A/file2 has no match
File A/filea best matches File B/fileb with 50 % of identical lines.

두 디렉터리에서 가장 일치하는 파일을 찾습니다.

답변1

관련 정보