使用下面的awk
我似乎正在返回错误的计数。基本上,虽然它们位于搜索的input
中,但找不到名称中带有-
的ID(file
)。我不确定命令中的错误是什么。谢谢你:)。
输入
SEPT12
SEPT5-GP1BB
SEPT9
HLA-DRB1
HLA-DRB5
文件
chr16 4837470 4837656 SEPT12
chr16 4837536 4837656 SEPT12
chr22 19711038 19711157 SEPT5-GP1BB
chr22 19711038 19711157 SEPT5-GP1BB
chr22 19711366 19711997 SEPT5-GP1BB
chr22 19711367 19711997 SEPT5-GP1BB
chr22 19711367 19711997 SEPT5-GP1BB
chr17 75398130 75398795 SEPT9
chr17 75471590 75471995 SEPT9
chr17 75478215 75478427 SEPT9
chr6 32487136 32487438 HLA-DRB1
chr6 32489671 32489961 HLA-DRB1
chr6 32551875 32552165 HLA-DRB5
当前输出
2 ids found
SEPT5-GP1BB missing
HLA-DRB1 missing
HLA-DRB5 missing
所需的输出
5 ids found
awk (missing.awk)
BEGIN { FS="[[:space:]]+|-" }
NR == FNR { seen[$0]; next }
$4 in seen { found[$4]; delete seen[$4] }
END { print length(found) " ids found"
for (i in seen) print i " missing" }
awk -f missing.awk input file > out
答案 0 :(得分:2)
Try something like this:
awk '
NR==FNR { lookup[$0]++; next }
($4 in lookup) { seen[$4]++ }
END {
print length(seen)" ids found";
for (id in seen) delete lookup[id];
for (id in lookup) print id " is missing"
}' input file
答案 1 :(得分:1)
What exactly does your output represent? Are you counting the number of elements in input that were found in file, regardless of how many times they were found?
If so, I think your code is fine as long as you don't (mis)set FS
. It seems to work for me when I comment that line out.