Question

我有两个csv文件F1和F2有相同顺序的行，我想通过比较F2中的文件F1和F2来提取更改/添加的行。

我尝试了diff命令，但我可以看到变化。我怎么能读取模式并从F2中提取线？

F1（文件1）：

1234,Joe,pieter,joe@gmail.com,male,22
1235,Shally,Jonse,shally@yahoo.com,female,24
1235,Harry,poter,harry@gmail.com,male,21
1235,Helen,Jairag,helen@gmail.com,female,21
2585,Dinesh,Jairag,helen@gmail.com,female,21

F2（文件2）：

1234,Joe,pieter,joe@gmail.com,male,22
1235,Shally,Jonse,shally@yahoo.com,female,24
1235,Harry,Potter,harry@gmail.com,male,21
1235,Helen,Jairag,helen@gmail.com,female,21

执行命令：

diff F2 F1

Out put：

3c3
< 1235,Harry,Potter,harry@gmail.com,male,21
---
> 1235,Harry,poter,harry@gmail.com,male,21
4a5
> 2585,Dinesh,Jairag,helen@gmail.com,female,21

文件F3中的预期输出：

1235,Harry,poter,harry@gmail.com,male,21
2585,Dinesh,Jairag,helen@gmail.com,female,21

Answer 1

diff --changed-group-format='%<' --unchanged-group-format='' file1 file2

Answer 2

我了解您要从File2 中提取更改/添加的行！
因此，在您的示例中，File2中只有一个更改的行，而File2中没有添加的行 diff的基本呼叫模式为diff old new，输出会告诉您需要执行哪些更新old。因此，要了解File2中的不同之处，您可以将其用作第二个参数。我建议对-u使用diff选项。这将为您提供File2中需要在File1中更改/添加的每一行，并在第一个位置使用+：

diff -u File1 File2

给出

--- File1 2012-08-22 11:30:07.000000000 +0200 +++ File2 2012-08-22 11:30:25.000000000 +0200 @@ -1,5 +1,4 @@ 1234,Joe,pieter,joe@gmail.com,male,22 1235,Shally,Jonse,shally@yahoo.com,female,24 -1235,Harry,poter,harry@gmail.com,male,21 +1235,Harry,Potter,harry@gmail.com,male,21 1235,Helen,Jairag,helen@gmail.com,female,21 -2585,Dinesh,Jairag,helen@gmail.com,female,21

现在只过滤以+开头的行，但前两行除外：

diff -u data1 data2 | \ awk 'NR > 2 && $0 ~ /^+/ {print substr($0,2)}' 1235,Harry,Potter,harry@gmail.com,male,21

或者相反：

diff -u data2 data1 | \ awk 'NR > 2 && $0 ~ /^+/ {print substr($0,2)}' 1235,Harry,poter,harry@gmail.com,male,21 2585,Dinesh,Jairag,helen@gmail.com,female,21

使用shell或diff命令从两个csv文件中提取已修改和添加的行

2 个答案: