Question

我有2个csv文件。一个有几列，另一个只有一列有域。这些文件的简化数据将是

file1.csv：

John,example.org,MyCompany,Australia
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US

file2.csv：

example.org
google.es
mysite.uk

输出应为

Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US

我试过这个解决方案 grep -v -f file2.csv file1.csv >output-file

在这里找到 http://www.unix.com/shell-programming-and-scripting/177207-removing-duplicate-records-comparing-2-csv-files.html

但是由于没有任何关于脚本如何工作的解释，而且我对shell很糟糕，我无法调整它以使其适用于我

对此的解决方案将受到高度赞赏，有一些解释的解决方案将是非常棒的！：）

编辑：

我已经尝试过想要工作的线路，但由于某种原因它没有。这是我终端的输出。这有什么问题？

Desktop $ cat file1.csv ; echo
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US
Desktop $ cat file2.csv ; echo
example.org
google.es
mysite.uk
Desktop $ grep -v -f file2.csv file1.csv
John,example.org,MyCompany,Australia
Lenny ,domain.com,OtherCompany,US
Martha,mysite.com,ThirCompany,US

为什么grep不会删除该行

John,example.org,MyCompany,Australia

Answer 1

一个在awk中：

$ awk -F, 'NR==FNR{a[$1];next}($2 in a==0)' file2 file1
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US

说明：

$ awk -F, '    # using awk, comma-separated records
NR==FNR {      # process the first file, file2
    a[$1]      # hash the domain to a
    next       # proceed to next record
}
($2 in a==0)   # process file1, if domain in $2 not in a, print the record
' file2 file1  # file order is important

Answer 2

你发布的那条线，效果很好。

$ grep -v -f file2.csv file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US

这是一个解释。 grep将搜索给定文件中的给定模式并打印匹配的所有行。最简单的用法示例是：

$ grep John file1.csv
John,example.org,MyCompany,Australia

这里我们使用了一个匹配每个字符的简单模式，但你也可以使用正则表达式（基本的，扩展的，甚至是perl兼容的）。

要反转逻辑，只打印不匹配的行，我们使用-v开关，如下所示：

$ grep -v John file1.csv
Lenny,domain.com,OtherCompany,US
Martha,site.com,ThirdCompany,US

要指定多个模式，您可以多次使用-e pattern选项，如下所示：

$ grep -v -e John -e Lenny file1.csv 
Martha,site.com,ThirdCompany,US

但是，如果要检查的模式数量较多，我们可能会使用-f file选项读取指定的file 中的所有模式。

所以，当我们将所有这些结合起来时;从-f的文件中读取模式并将匹配逻辑反转为-v，我们得到您需要的行。

比较2个csv文件并删除行 - Shell

2 个答案: