如何找到csv文件和仅包含此csv的一列的文件之间的区别

时间:2010-07-30 18:57:50

标签: diff replace

我有一个包含一些用户数据的CSV文件,如下所示:

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"12222","","an.4","Wendy","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""

我还有一个文件,每行都有一个项目,如下所示:

an.10
arron.5

我想要的只是找到列表文件中包含的CSV文件中的行。

所以期望的输出是:

"10333","","an.10","Kenyata","","Aaron","","","","","","","","","",""
"14343","","aaron.5","Nanci","","Aaron","","","","","","","","","",""

(注意这个新列表中不包含an.4。)

我有任何可用的环境,除了手动操作之外我愿意尝试任何事情,因为这个csv包含数百万条记录,并且列表中有大约100k条目。

2 个答案:

答案 0 :(得分:1)

标识符an.10之类的唯一性如何?

也许一个非常小的* x shell脚本就足够了:

for i in $(uniq list.txt); do grep "\"$i\"" data.csv; done

对于列表中的每个唯一条目,都会返回csv文件中的所有匹配行。但它并不完全匹配第二列。 (这可以通过例如awk完成)

答案 1 :(得分:1)

如果csv文件是data.csv而列表文件是list.txt,我会这样做:

for i in `cat list.txt`; do grep $i data.csv; done