awk来计算找不到的输出字符串

时间:2016-03-08 16:14:26

标签: awk

我没有使用下面的awk获得正确的输出。基本上,如果在file1中找不到file2中的字符串,则会将其打印为"缺少"。如果找到该字符串,则将其计为"找到"。截至目前,返回一个零字节的文件。谢谢你:)。

文件1

A2M
A4GALT
AGRN

file2的

chr1    955543  955763  chr1:955543-955763  AGRN-6|gc=75
chr1    957571  957852  chr1:957571-957852  AGRN-7|gc=61.2

AWK

awk -F'[ -]' 'NR == FNR { seen[$0]; next } !seen[$6]++ { n++ }
> END { print n " ids found"; for (i in seen) if (!seen[i]) print i " missing" }' file1 file2

所需的输出

1 id found (`since the AGRN string was found`)
A2M missing
A4GALT missing

1 个答案:

答案 0 :(得分:1)

$ cat m.awk
BEGIN { FS="[[:space:]]+|-" }
NR == FNR { seen[$0]; next }
$6 in seen { found[$6]; delete seen[$6] }
END { print length(found) " ids found"
      for (i in seen) print i " missing" }

$ awk -f m.awk file1 file2
1 ids found
A4GALT missing
A2M missing