Question

我有一个日志文件，其中包含以下日志语句

e.g。

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

....

Before starting transaction id = <unique number>
After starting transaction id = <unique number>

当我为“之前”做一个简单的grep时，我看到400个语句，但是当我为“After”做一个简单的grep时，我看到了402个语句。

如何在不成对出现的情况下找到这些陈述。

Answer 1

提取Before和After ID，然后区分它们，如下所示：

$ diff -wb <(grep Before file | cut -d= -f2 | sort) <(grep After file | cut -d= -f2 | sort)

如果您的shell不支持process substitution，即<(...)，请使用临时文件：

$ grep Before file | cut -d= -f2 | sort > before
$ grep After file | cut -d= -f2 | sort > after
$ diff -wb before after

Answer 2

如果在之前和之后配对应该具有相同的unique number：

awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file

将打印那些未配对的ID。

e.g：

kent$  cat file
Before starting transaction id = 1
After starting transaction id = 1
Before starting transaction id = 2
After starting transaction id = 2
Before starting transaction id = 3
Before starting transaction id = 4
After starting transaction id = 4
After starting transaction id = 5

kent$  awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file
id: 3
id: 5

Answer 3

grep对于这项工作来说也不是最好的，因为它无法读取多行。您可以使用-B1成对阅读它们，但是您仍然需要使用更强大的工具（例如sed，awk或其他工具）进行解析。

这是另一种方法，也适用于你得到extraneus前行（echo就在那里你可以干它运行）：

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 54675
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -u -f5 # end cmd
After starting transaction id = 54675

通过仅检查唯一ID来工作。由于我不知道你在那里得到什么样的内容，也许它们是现有条目的重复，在这种情况下你必须以不同的方式做。这是一种更安全的方法，它捕获两种情况并返回ID频率大于或小于2的出现次数：

$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 567
Before starting transaction id = 567
After starting transaction id = 567' | 
  sort -k6 | uniq -c -f5 | grep -v "^[[:space:]]*2[[:space:]]"
3 After starting transaction id = 567

如何在比较两条相似的线条时找到差异

3 个答案: