Question

我的文件如下：

manish@yahoo.com
Rajesh.patel@hotmail.in
jkl@gmail.uk
New123@utu.ac.in
qwe@gmail.co.in

我想将每个域的出现计为

Domain Name No of Email
-----------------------
com         1
in          3
uk          1

Answer 1

这是一个纯POSIX awk解决方案（在sort程序中调用了awk）：

awk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
    for (k in a) print k, a[k] | "sort"
  }
' file

如果您有GNU awk 4.0或更高版本，则可以在没有外部sort的情况下进行操作，甚至可以轻松控制gawk程序中的排序字段：

gawk -F. -v OFS='\t' '
    # Build an associative array that maps each unique top-level domain
    # (taken from the last `.`-separated field, `$NF`) to how often it
    # occurs in the input.
  { a[$NF]++ }

  END { 
      # Print the header.
    print "Domain Name", "No of Email"
    print "----------------------------"
     # Output the associative array and sort it (by top-level domain).
     # First, control output sorting by setting the order in which 
     # the associative array will be looped over by, via the special
     # PROCINFO["sorted_in"] variable; e.g.:
     #  - Sort by top-level domain, ascending:  "@ind_str_asc"
     #  - Sort by occurrence count, descending: "@val_num_desc"
    PROCINFO["sorted_in"]="@ind_str_asc"
    for (k in a) print k, a[k]
  }
' file

Answer 2

您可以使用sed，sort，uniq：

sed 's/.*[.]//' input | sort | uniq -c

给出：

  1 com
  3 in
  1 uk

使用awk进行一些化妆：

sed 's/.*[.]//' input | sort | uniq -c | \
     awk 'BEGIN{print "Domain Name No of Email\n-----------------------"} \
          {print $2"\t\t"$1}'

获取：

Domain Name No of Email
-----------------------
com     1
in      3
uk      1

如何使用关联数组使用awk命令计算文件中特定字符的出现次数

2 个答案: