我正在比较两个具有两列的CSV文件:文件名和文件的哈希。我需要找出具有不匹配哈希值的文件和first.csv
中不存在的second.csv
中的新文件。我想输出这样的文件名,例如BlockchainContextHelp.tsv
和new1.tsv
first.csv
#File,SHA-1
BlockchainContextHelp.tsv,1234562eertyrtyty3rer
new.tsv,7777hhrtdk12kefk23kfmsd
second.csv
#File,SHA-1
BlockchainContextHelp.tsv,123522234rrtkoe98877
new.tsv,7777hhrtdk12kefk23kfmsd
new1.tsv,3456734dfkekeruer7ererj
下面是我到目前为止所尝试的。
#!/bin/bash
while IFS="," read f1 f2;do
while IFS="," read c1 c2;do
if [ $f2 != $c2 ]
then
echo "$f1"
fi
done < second.csv
done < first.csv
赞赏任何建议。
答案 0 :(得分:1)
awk
是更好的文本处理工具。您可以使用:
awk 'BEGIN {
FS = OFS = "," # set input/output field separator as ,
}
NR == FNR { # While processing the first file
map[$1] = $2 # store the second column by the first
next # move to next record
}
!($1 in map) || map[$1] != $2 { # In 2nd file if $1 not in map and 2nd
# column of second file is not same as
# what is in map
print $1 # print first column
}' first.csv second.csv
BlockchainContextHelp.tsv
new1.tsv