awk和grep with filter?

时间:2014-12-29 12:01:10

标签: awk grep find

这是我要过滤的日志文件,

xxxyyy.com/plugins/status.gif?type=videoprogress;status=first;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=IN;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoprogress;status=mid;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=US;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoprogress;status=third;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=US;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoprogress;status=complete;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=IN;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoothers;status=pause;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=IN;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoothers;status=mute;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1547;cid=IN;cpid=1547
xxxyyy.com/plugins/status.gif?type=videoothers;status=unmute;sid=6941c712-ca83-4aa1-a69a-931ca66df655;vid=606829;vrid=61478182;pid=1545;cid=IN;cpid=1545
xxxyyy.com/plugins/status.gif?type=videoothers;status=error;sid=6941c712-ca83-4aa1-a69a-931ca66df656;vid=606829;vrid=61478182;pid=1546;cid=IN;cpid=1546

我需要这样的输出

pid  cid cpid Count  
1545 IN  1545   4  
1545 US  1545   2  
1546 IN  1546   1    
1547 IN  1547   1  

请有人帮助我

3 个答案:

答案 0 :(得分:1)

快速而肮脏:

kent$  awk -F';' '{a[$(NF-2) OFS $(NF-1) OFS $NF]++}
                   END{for(x in a)print x, a[x]}' file
pid=1547 cid=IN cpid=1547 1
pid=1545 cid=US cpid=1545 2
pid=1546 cid=IN cpid=1546 1
pid=1545 cid=IN cpid=1545 4

现在您可以调整输出以适合您所需的格式。

答案 1 :(得分:0)

与肯特的差别很小:

awk -F';' '{ split($6,pid,"="); split($7,cid,"="); split($8,cpid,"="); n[pid[2] OFS cid[2] OFS cpid[2]]++; } END { print "pid","cid","cpid","count"; for (p in n) { print p,n[p] } }' input.txt

给出:

pid cid cpid count
1545 IN 1545 4
1545 US 1545 2
1546 IN 1546 1
1547 IN 1547 1

只是带注释的代码

{ 
  split($6,pid,"="); split($7,cid,"="); split($8,cpid,"="); # Get the numbers from each pair in an array
  n[pid[2] OFS cid[2] OFS cpid[2]]++; # count the tuples from the numbers (create an array with the tuples as key and increment it)
} 
END { 
  print "pid","cid","cpid","count"; # print the header
  for (p in n) { print p,n[p] } # print the key (tuples) and the count of it
}

答案 2 :(得分:0)

另一种方式,类似于其他方式

awk -F';' '{for(i=0;i<3;i++){split($(NF-i),a,"=");x=a[2]" "x;NR==1&&y=a[1]" "y}
            b[x]++;x=z}END{print y "count";for(i in b)print i b[i]}' file

提取对的值和名称,然后使用值作为键递增数组 打印出提取的标题和新的count标题。 循环数组打印出键(值)和出现次数

输出

pid cid cpid count
1547 IN 1547 1
1545 IN 1545 4
1546 IN 1546 1
1545 US 1545 2