我有一个日志文件,其中包含以下日志语句
e.g。
Before starting transaction id = <unique number>
After starting transaction id = <unique number>
....
Before starting transaction id = <unique number>
After starting transaction id = <unique number>
当我为“之前”做一个简单的grep时,我看到400个语句,但是 当我为“After”做一个简单的grep时,我看到了402个语句。
如何在不成对出现的情况下找到这些陈述。
答案 0 :(得分:2)
提取Before
和After
ID,然后区分它们,如下所示:
$ diff -wb <(grep Before file | cut -d= -f2 | sort) <(grep After file | cut -d= -f2 | sort)
如果您的shell不支持process substitution,即<(...)
,请使用临时文件:
$ grep Before file | cut -d= -f2 | sort > before
$ grep After file | cut -d= -f2 | sort > after
$ diff -wb before after
答案 1 :(得分:2)
如果在之前和之后配对应该具有相同的unique number
:
awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file
将打印那些未配对的ID。
e.g:
kent$ cat file
Before starting transaction id = 1
After starting transaction id = 1
Before starting transaction id = 2
After starting transaction id = 2
Before starting transaction id = 3
Before starting transaction id = 4
After starting transaction id = 4
After starting transaction id = 5
kent$ awk -F= '{a[$2]++;}END{for(i in a)if(a[i]!=2)print "id:"i}' file
id: 3
id: 5
答案 2 :(得分:1)
grep
对于这项工作来说也不是最好的,因为它无法读取多行。您可以使用-B1成对阅读它们,但是您仍然需要使用更强大的工具(例如sed
,awk
或其他工具)进行解析。
这是另一种方法,也适用于你得到extraneus前行(echo
就在那里你可以干它运行):
$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 54675
Before starting transaction id = 567
After starting transaction id = 567' |
sort -k6 | uniq -u -f5 # end cmd
After starting transaction id = 54675
通过仅检查唯一ID来工作。由于我不知道你在那里得到什么样的内容,也许它们是现有条目的重复,在这种情况下你必须以不同的方式做。这是一种更安全的方法,它捕获两种情况并返回ID频率大于或小于2的出现次数:
$ echo 'Before starting transaction id = 123
After starting transaction id = 123
After starting transaction id = 567
Before starting transaction id = 567
After starting transaction id = 567' |
sort -k6 | uniq -c -f5 | grep -v "^[[:space:]]*2[[:space:]]"
3 After starting transaction id = 567