Question

我正在比较两个具有两列的CSV文件：文件名和文件的哈希。我需要找出具有不匹配哈希值的文件和first.csv中不存在的second.csv中的新文件。我想输出这样的文件名，例如BlockchainContextHelp.tsv和new1.tsv

first.csv
#File,SHA-1
BlockchainContextHelp.tsv,1234562eertyrtyty3rer
new.tsv,7777hhrtdk12kefk23kfmsd

second.csv
#File,SHA-1
BlockchainContextHelp.tsv,123522234rrtkoe98877
new.tsv,7777hhrtdk12kefk23kfmsd
new1.tsv,3456734dfkekeruer7ererj

下面是我到目前为止所尝试的。

#!/bin/bash
while IFS="," read f1 f2;do
        while IFS="," read c1 c2;do
                if [ $f2 != $c2 ]
                then
                        echo "$f1"
                fi
        done < second.csv
done < first.csv

赞赏任何建议。

Answer 1

awk是更好的文本处理工具。您可以使用：

awk 'BEGIN {
   FS = OFS = ","                # set input/output field separator as ,
}
NR == FNR {                      # While processing the first file
   map[$1] = $2                  # store the second column by the first
   next                          # move to next record
}
!($1 in map) || map[$1] != $2 {  # In 2nd file if $1 not in map and 2nd 
                                 # column of second file is not same as 
                                 # what is in map
   print $1                      # print first column
}' first.csv second.csv

BlockchainContextHelp.tsv
new1.tsv

比较CSV文件

1 个答案: