我正在处理大量数据(每次检查几百行),并想知道比较两组不同数据的最有效方法是什么。
我正在寻找的是找到以下差异:
来自来源1:
site1.49729 site2.80124 /path/path/path/path
site1.49730 site2.80125 /path/path/path/path
site1.49734 site2.80126 /path/path/path/path
site1.49735 site2.80127 /path/path/path/path
site1.49736 site2.80128 /path/path/path/path
site1.49737 site2.80129 /path/path/path/path
site1.49738 site2.80131 /path/path/path/path
site1.49752 site2.80171 /path/path/path/path
来自来源2:
site1.49729 site2.80124 /path/path/path/path
site1.49730 site2.80125 /path/path/path/path
site1.49734 **site2.1234** /path/path/path/path
site1.49735 site2.80127 /path/path/path/path
site1.49736 site2.80128 /path/path/path/path
site1.49737 **site2.12345** /path/path/path/path
site1.49738 site2.80131 /path/path/path/path
site1.49752 site2.80171 /path/path/path/path
**site1.49735 site2.99999 /path/path/path/path**
用** 突出显示的差异
确保两个命令的第二列中的所有内容都不会丢失,并且#2与记录完全匹配的最有效方法是什么?
关于从哪里开始的任何想法?
答案 0 :(得分:0)
我建议只针对源1和源2运行diff
。它会显示包含差异的行。将源1的内容放在s1.txt
中,将源2的内容放在s2.txt
中,然后运行命令:
$ diff -y s1.txt s2.txt
这将显示两个文件之间的差异。
答案 1 :(得分:0)
使用'diff'命令。它为您的情况生成如下所示的输出:
< site1.49734 site2.80126 /path/path/path/path
---
> site1.49734 **site2.1234** /path/path/path/path
6c6
< site1.49737 site2.80129 /path/path/path/path
---
> site1.49737 **site2.12345** /path/path/path/path
8c8,9
< site1.49752 site2.80171 /path/path/path/path
\ No newline at end of file
---
> site1.49752 site2.80171 /path/path/path/path
> **site1.49735 site2.99999 /path/path/path/path**
有许多文本编辑器提供用于区分文件或查看差异的GUI(例如Notepad ++)