计算匹配单词的出现次数

时间:2012-12-27 13:29:02

标签: unix sed awk

  

可能重复:
  need code in awk unix or use substr

我想在ctr{words}之间打印单词并计算单词的频率。

我试过了:

$ sed -n 's/.*ctr{\(.[^}]*\).*/\1/p' file

代码可以完美地打印出单词,但我需要计算单词的频率。

以上代码的输出:

Mo7afazat
JaishanaIN
ZainElKul
ZainUnlimited
AlBarakehNew
ZainElKulSN

但我需要:

1 Mo7afazat
1 JaishanaIN
2 ZainElKul
1 ZainUnlimited
1 AlBarakehNew
1 ZainElKulSN

文件是:

962796057604|mar0101|0|00000107A20E00000A6C331650B920340C00|0|0|400019FD7DBFBF7F|1001|962796057604|0 |01001|||-1|795971936|  00962795971936|16||-1|  00962795971936|-1|0|2|0|416019000659493|0||||||0|0|2012.12.01 00:07:09|12|30|0|516|16|1|2012.12.01 00:06:39|1|0||202|20001||0B12F1001104697209100300000000000000|1|1|11000|0|0||0881006972091003F000||0 714F610045584E6|000000000000|3|1|0000000000000000|0|140|0|0|0|0|0|0|||0|2|||||||||||||||||||||0|||0| |0|1|143|acf{0}cif{0}fcf{0}con{0}cuf{0}ctr{**Mo7afazat**}cgpa{962796057604}vlr{0096279001300}cff{0}roaf{0}mpty{0}ftksn{JMT}ftksr{0001}ftktp{CallTicketCPOCS} ||
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962796298894|mar0101|0|000001028225AE4AD868A8B750B900980C00|1|0|4000018001000002||962796298894|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||3797|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|244|tid{111210532409329884}pfid{20}gob{1}rid{globitel} afid{}uid1{962796298894}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**JaishanaIN**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}||
1|34|2012.12.01 00:08:35|12|4|100-50-0-962796605155|mar0101|0|00000102A20400000A6A439D50B920520C00|0|0|400019FD7DBFBF7F|1001|962796605155|1 6||||-1|b116c||16||-1||-1|0|0|0|416017002233360|0||||||0|0|1970.01.01 02:00:00|0|0|0|220|0|1|1970.01.01 02:00:00|1|0||194|0||000000000000000000000000000000000000|0|0||0|0||00000000000000000000||0000000000 000000|000000000000|0|0|0000000000000000|0|370|0|0|0|0|0|0|||0|0|||||||||||||||||||||0|||0||0|1|70|a cf{3}ussd{1}ctr{**ZainElKul**}ftksn{JMT}ftksr{0001}ftktp{CallTicketCPOCS}||
1|34|2012.12.01 00:08:35|12|4|100-10-0
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797611253|mar0101|0|0000010282B54BD015FF4C4B50B8F96E0C00|1|0|4000018001000002||962797611253|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||885|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|243|tid{111220371293561120}pfid{20}gob{1}rid{globitel} afid{}uid1{962797611253}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**ZainElKul**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}||

-962795292027|mar0101|0|00000101A20200000A6A96B750B920300C00|0|0|400019FD7DBFBF7F|1001|962795292027|0 |01004|||-1|797196452|  00962797196452|16||-1|  00962797196452|-1|0|2|0|416018002276781|0||||||0|0|2012.12.01 00:07:09|12|12|23|516|16|1|2012.12.01 00:06:34|1|0||202|1||0B12F1001104697209100300000000000000|1|1|11000|0|0||0881006972091003F000||0714F 6100455AD67|000000000000|3|1|0000000000000000|0|30|0|0|0|0|0|0|||0|0|||||||||||||||||||||0|||0||0|1| 171|acf{0}cif{0}fcf{0}con{0}cuf{0}ctr{ZainUnlimited}cgpa{962795292027}vlr{0096279001300}cff{0}roaf{0}mpty{0}cacc{1;0;30}cquo{1;230;}ftksn{JMT}ftksr{000 1}ftktp{CallTicketCPOCS}||
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962796012818|mar0101|0|0000010882218115085D5F9150B920520C00|0|0|4000018001000002||962796012818|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|1|||||||||||||0|0|||70|0|0|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|258|tid{111221366974701289}pfid{17}gob{1}rid{globitel} afid{}uid1{962796012818}aid1{1}ar1{-2147483648}uid2{}aid2{-1}pid{DEFAULT_DECISION}pur{!GDRC Balance Check}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**AlBarakehNew**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}||
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-962797251349|mar0101|0|0000010282A451483EDFCFD350B920400C00|1|0|4000018001000002||962797251349|||||-1|||||-1||-1|0||||0||||||-1|-1|||-1|0|-1|-1|-1|2012.12.01 00:08:35|1|0||-1|0|||||||||||||0|0|||440|0|12|-2147483648|-2147483648|-2147483648|-2147483648|||||||||||||||||||||||||0|||0||1|6|245|tid{111211342745325133}pfid{20}gob{1}rid{globitel} afid{}uid1{962797251349}aid1{1}ar1{0}uid2{globitel}aid2{-1}pid{1234}pur{!GDRC COMMIT AMOUNT 0}ratinf{}rec{0}rots{0}tda{}mid{}exd{0}reqa{0}ctr{**ZainElKulSN**}ftksn{JMT}ftksr{0001}ftktp{PayCallTicket}||
1|34|2012.12.01 00:08:35|12|4|921-*203-0000000000-

2 个答案:

答案 0 :(得分:1)

只需在单行结尾添加| sort | uniq -c

变成:

sed -n 's/.*ctr{\(.[^}]*\).*/\1/p' file | sort | uniq -c

答案 1 :(得分:0)

你甚至不必使用正则表达式,看看:

cat test.txt | tr "}" "\n" | grep "ctr" | cut -d "{" -f 2 | sort | uniq -c

输出:

  1 AlBarakehNew
  1 JaishanaIN
  1 Mo7afazat
  2 ZainElKul
  1 ZainElKulSN
  1 ZainUnlimited