Question

我正在尝试将具有某些共享字段的表连接到一个表中。

file1看起来像这样：

organism score_1
orgA 1
orgC 0

file2看起来像这样：

organism score_2
orgA 1
orgD 0

并且我使用以下内容加入他们：

join -e 0 -v1 -j 1 --header file1.txt file2.txt > compile.txt

但是结果是这个

organism score_1 score_2
orgA 1
orgA 1
orgC 0
orgD 0

我想要得到的是这个

organism score_1 score_2
orgA 1 1
orgC 0 0
orgD 0 0

关于如何解决此问题的任何建议？

Answer 1

这是awk中的一个：

$ awk '
NR==FNR {                  # hash file1 to hash a
    a[$1]=$2
    next
}
{                          # process file2
    if($1 in a) {          # if $1 in file1 
        print $1,a[$1],$2  # output 
        delete a[$1]       # ... and delete
    } else                 # if not found in file1
        print $1,$2,$2     # output differently
}
END {                      # output the leftovers from file1
    for(i in a)            # order is awk implementation specific
        print i,a[i],a[i]
}' file1 file2

输出：

organism score_1 score_2
orgA 1 1
orgD 0 0
orgC 0 0

Answer 2

诀窍是使用-a，而不是-v（顺便说一下，您的join调用不会产生您说的那样的输出）：

$ join --header -e 0 -a1 -a2 -j1 -o auto file1.txt file2.txt
organism score_1 score_2
orgA 1 1
orgC 0 0
orgD 0 0

（需要使用GNU版本的join，但是由于您已经在使用--header，所以我认为这不是问题）

join命令将匹配字段插入为新行，如何为每个匹配生成一行？

2 个答案: