如何从文件中获取唯一的事件?

时间:2013-02-25 17:48:11

标签: linux perl bash sed awk

我在尝试从具有与以下内容类似格式的日志文件中获取DeviceId的唯一出现时遇到一些问题:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"123"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

我期待的是这样的输出:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

我尝试使用awk,但我似乎可以解决这个问题。有谁知道怎么做?

我知道应该有一种方法可以使用DeviceId打印awk,但我似乎无法弄明白。获得DeviceId后,我可以直接转到sortuniq

5 个答案:

答案 0 :(得分:4)

使用Perl:

perl -lne 'if ( m{"DeviceId":" ([^"]+) "}xms ) { print if not $seen{$1}++; }' <log

答案 1 :(得分:4)

使用GNU awk:

gawk 'match($0, /DeviceId":"([^"]+)/, a) && seen[a[1]]++ == 0' log

根据您的输入,此输出

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

注意,这基本上是@Perleone's answer的gawk翻译,虽然我当时没有注意到

答案 2 :(得分:1)

根据@ cnicutar的回答,使用sedsortcut

sed 's/.*\"DeviceId":"\([0-9]*\).*/\1\t\0/' <file> | sort -u -k 1,1 | cut -f 2

输出:

log: {"deviceInfo":{"DeviceId":"123","device":"Android"}
log: {"deviceInfo":{"device":"Android","DeviceId":"234"}
log: {"deviceInfo":{"device":"iPhone","DeviceId":"323"}

答案 3 :(得分:1)

使用awk的唯一设备ID:

$ awk '/DeviceId/&&!a[$1]++&&gsub(/[^[:digit:]]/,"")' RS='[{,}]' file
123
234
323

awk的好处是关联数组,不需要管道到sort -u

答案 4 :(得分:1)

最好解析JSON(但另一个快速awk):

awk -F'.*DeviceId":"|["}]' '!A[$2]++' file 

应用Ed Morton关于削减3个字符的建议:

awk -F'.*DeviceId":"|"' '!A[$2]++' file