如何在比较两条相似的线条时找到差异

时间:2013-02-15 09:30:37

标签: unix grep

我有一个日志文件,其中包含以下日志语句

e.g。

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

....

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

当我为“之前”做一个简单的grep时,我看到400个语句,但是 当我为“After”做一个简单的grep时,我看到了402个语句。

如何在不成对出现的情况下找到这些陈述。

3 个答案:

答案 0 :(得分:2)

提取BeforeAfter ID,然后区分它们,如下所示:

$ diff -wb <(grep Before file | cut -d= -f2 | sort) <(grep After file | cut -d= -f2 | sort)

如果您的shell不支持process substitution,即<(...),请使用临时文件:

$ grep Before file | cut -d= -f2 | sort > before
$ grep After file | cut -d= -f2 | sort > after
$ diff -wb before after

答案 1 :(得分:2)

如果在之前和之后配对应该具有相同的unique number

awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file

将打印那些未配对的ID。

e.g:

kent$  cat file
Before starting transaction id = 1
After starting transaction id = 1
Before starting transaction id = 2
After starting transaction id = 2
Before starting transaction id = 3
Before starting transaction id = 4
After starting transaction id = 4
After starting transaction id = 5

kent$  awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file
id: 3
id: 5

答案 2 :(得分:1)

grep对于这项工作来说也不是最好的,因为它无法读取多行。您可以使用-B1成对阅读它们,但是您仍然需要使用更强大的工具(例如sedawk或其他工具)进行解析。

这是另一种方法,也适用于你得到extraneus前行(echo就在那里你可以干它运行):

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 54675
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -u -f5 # end cmd
After starting transaction id = 54675

通过仅检查唯一ID来工作。由于我不知道你在那里得到什么样的内容,也许它们是现有条目的重复,在这种情况下你必须以不同的方式做。这是一种更安全的方法,它捕获两种情况并返回ID频率大于或小于2的出现次数:

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 567
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -c -f5 | grep -v "^[[:space:]]*2[[:space:]]"
3 After starting transaction id = 567