使用awk查找两个文本文件之间的差异和相似之处

时间:2014-06-25 16:18:22

标签: bash shell awk scripting

我有两个文件:

档案1

1
2
34:rt
4

文件2

1
2
34:rt
7

我想显示文件2中但不在文件1中的行,反之亦然,以及两个文本文件中的相同值。所以文件的预期结果应如下所示:

1 in both
2 in both
34:rt in both
4 in file 1
7 in file 2

这是我到目前为止所做的,但我不确定这是否是正确的结构:

 awk '    
    FNR == NR {      
        a[$0]++;
        next;           
    }    

    !($0 in a) {                          
        // print not in file 1
    }



    ($0 in a) {                         

        for (i = 0; i <= NR; i++) {
            if (a[i] == $0) {
                // print same in both
            } 
        }

        delete a[$0]  # deletes entries which are processed
    }

    END {                               
        for (rest in a) {                
            // print not in file 2
        }
    }' $PWD/file1 $PWD/file2

有什么建议吗?

1 个答案:

答案 0 :(得分:1)

如果订单不相关,那么您可以这样做:

awk '
NR==FNR { a[$0]++; next }
{
    print $0, ($0 in a ? "in both" : "in file2");
    delete a[$0]
}
END {
    for(x in a) print x, "in file1"
}' file1 file2
1 in both
2 in both
34:rt in both
7 in file2
4 in file1

或使用choroba在评论中建议的comm

comm --output-delimiter="|" file1 file2 | 
awk -F'|' '{print (NF==3 ? $NF " in both" : NF==2 ? $NF "in file2" : $NF " in file1")}'
1 in both
2 in both
34:rt in both
4 in file1
7 in file2