如何使用关联数组使用awk命令计算文件中特定字符的出现次数

时间:2014-04-02 02:41:31

标签: awk associative-array

我的文件如下:

manish@yahoo.com
Rajesh.patel@hotmail.in
jkl@gmail.uk
New123@utu.ac.in
qwe@gmail.co.in

我想将每个域的出现计为

Domain Name No of Email
-----------------------
com         1
in          3
uk          1

2 个答案:

答案 0 :(得分:3)

这是一个纯POSIX awk解决方案(在sort程序中调用了awk):

awk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
    for (k in a) print k, a[k] | "sort"
  }
' file

如果您有GNU awk 4.0或更高版本,则可以在没有外部sort的情况下进行操作,甚至可以轻松控制gawk程序中的排序字段:

gawk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
     # First, control output sorting by setting the order in which 
     # the associative array will be looped over by, via the special
     # PROCINFO["sorted_in"] variable; e.g.:
     #  - Sort by top-level domain, ascending:  "@ind_str_asc"
     #  - Sort by occurrence count, descending: "@val_num_desc"
    PROCINFO["sorted_in"]="@ind_str_asc"
    for (k in a) print k, a[k]
  }
' file

答案 1 :(得分:2)

您可以使用sedsortuniq

sed 's/.*[.]//' input | sort | uniq -c

给出:

  1 com
  3 in
  1 uk

使用awk进行一些化妆:

sed 's/.*[.]//' input | sort | uniq -c | \
     awk 'BEGIN{print "Domain Name No of Email\n-----------------------"} \
          {print $2"\t\t"$1}'

获取:

Domain Name No of Email
-----------------------
com     1
in      3
uk      1