我的文件如下:
manish@yahoo.com
Rajesh.patel@hotmail.in
jkl@gmail.uk
New123@utu.ac.in
qwe@gmail.co.in
我想将每个域的出现计为
Domain Name No of Email
-----------------------
com 1
in 3
uk 1
答案 0 :(得分:3)
这是一个纯POSIX awk
解决方案(在sort
程序中调用了awk
):
awk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
for (k in a) print k, a[k] | "sort"
}
' file
如果您有GNU awk 4.0
或更高版本,则可以在没有外部sort
的情况下进行操作,甚至可以轻松控制gawk
程序中的排序字段:
gawk -F. -v OFS='\t' '
# Build an associative array that maps each unique top-level domain
# (taken from the last `.`-separated field, `$NF`) to how often it
# occurs in the input.
{ a[$NF]++ }
END {
# Print the header.
print "Domain Name", "No of Email"
print "----------------------------"
# Output the associative array and sort it (by top-level domain).
# First, control output sorting by setting the order in which
# the associative array will be looped over by, via the special
# PROCINFO["sorted_in"] variable; e.g.:
# - Sort by top-level domain, ascending: "@ind_str_asc"
# - Sort by occurrence count, descending: "@val_num_desc"
PROCINFO["sorted_in"]="@ind_str_asc"
for (k in a) print k, a[k]
}
' file
答案 1 :(得分:2)
您可以使用sed
,sort
,uniq
:
sed 's/.*[.]//' input | sort | uniq -c
给出:
1 com
3 in
1 uk
使用awk
进行一些化妆:
sed 's/.*[.]//' input | sort | uniq -c | \
awk 'BEGIN{print "Domain Name No of Email\n-----------------------"} \
{print $2"\t\t"$1}'
获取:
Domain Name No of Email
-----------------------
com 1
in 3
uk 1