使用awk

时间:2016-04-21 21:19:30

标签: awk

使用下面的awk我似乎正在返回错误的计数。基本上,虽然它们位于搜索的input中,但找不到名称中带有-的ID(file)。我不确定命令中的错误是什么。谢谢你:)。

输入

SEPT12
SEPT5-GP1BB
SEPT9
HLA-DRB1
HLA-DRB5

文件

chr16 4837470 4837656 SEPT12
chr16 4837536 4837656 SEPT12
chr22 19711038 19711157 SEPT5-GP1BB
chr22 19711038 19711157 SEPT5-GP1BB
chr22 19711366 19711997 SEPT5-GP1BB
chr22 19711367 19711997 SEPT5-GP1BB
chr22 19711367 19711997 SEPT5-GP1BB
chr17 75398130 75398795 SEPT9
chr17 75471590 75471995 SEPT9
chr17 75478215 75478427 SEPT9
chr6 32487136 32487438 HLA-DRB1
chr6 32489671 32489961 HLA-DRB1
chr6 32551875 32552165 HLA-DRB5

当前输出

2 ids found
SEPT5-GP1BB missing
HLA-DRB1 missing
HLA-DRB5 missing

所需的输出

 5 ids found 

awk (missing.awk)

BEGIN { FS="[[:space:]]+|-" }
NR == FNR { seen[$0]; next }
$4 in seen { found[$4]; delete seen[$4] }
END { print length(found) " ids found"
  for (i in seen) print i " missing" }

awk -f missing.awk input file > out

2 个答案:

答案 0 :(得分:2)

Try something like this:

awk '
    NR==FNR { lookup[$0]++; next }
    ($4 in lookup) { seen[$4]++ } 
    END {
      print length(seen)" ids found"; 
      for (id in seen) delete lookup[id]; 
      for (id in lookup) print id " is missing"
}' input file

答案 1 :(得分:1)

What exactly does your output represent? Are you counting the number of elements in input that were found in file, regardless of how many times they were found?

If so, I think your code is fine as long as you don't (mis)set FS. It seems to work for me when I comment that line out.