如何比较太相似的文件

时间:2019-11-06 16:44:07

标签: c shell awk

我有两个这样的文本文件:

行就像=> SITE.MACHINE.VARIABLE_NAME = VARIABLE_VALUE

CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC13.CHRONO_SANSREPONSE_KEEPALIVE=0
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
...

它们已经是-u

我必须找出哪些行在一个文件中或在另一个文件中,或已被修改(我不在乎常见的行),例如 sdiff 命令。 但是文件的行太相似,会导致diff错误。

我正在考虑“ =”左侧的diff,如果可以,请检查右侧。 我正在寻找一种打印sdiff之类的输出的解决方案。

输出想要的示例:

File1                                                         | File2
CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:0:0"  | CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES="1:0:1:1:0:0:0:1:0"
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=1              | CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=1               | CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE=0
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=1             | CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE=0
CPM-NOMINAL.WAC12.PROTOCOLE_PDD=2                             | CPM-NOMINAL.WAC12.PROTOCOLE_PDD=3
                                                              > CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST="p_initialiser"
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=FALSE                   | CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE=TRUE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT=32099
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3201                    | DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT=3204

谢谢。

2 个答案:

答案 0 :(得分:2)

join

可以完成类似的操作
$ join -a1 -a2 -e"---" -t= -o1.1,1.2,2.2,2.1 file1 file2 | column -ts=

CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES         "1:0:1:1:0:0:0:0:0"             "1:0:1:1:0:0:0:1:0"  CPM-NOMINAL.WAC10.SAR_PARI_SUJET_A_COTES
CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE   1                               0                    CPM-NOMINAL.WAC12.CHRONO_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE    1                               0                    CPM-NOMINAL.WAC12.PARIS_SANSREPONSE_KEEPALIVE
CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE  1                               0                    CPM-NOMINAL.WAC12.PARIS_SANS_EMISSION_KEEPALIVE
CPM-NOMINAL.WAC12.PROTOCOLE_PDD                  2                               3                    CPM-NOMINAL.WAC12.PROTOCOLE_PDD
---                                              ---                             "p_initialiser"      CPM-NOMINAL.WAC7.SQL_PROC_INIT_XAPDD_MBN_TEST
CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE            FALSE                           TRUE                 CPM-NOMINAL.WAC8.FAIRE_VERIF_CHAINAGE
DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT            3201                            32099                DEMO-WEB.WAC7.XN_TCP_SERVICE_PDD_PORT
DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT            3201                            3204                 DEMO-WEB.WAC7.XN_TCP_SERVICE_SAR_PORT

公共值可以通过管道传递到awk '$2!=$3'

来消除

答案 1 :(得分:1)

这是使用传统工具和管道执行此操作的一种可能方法。我使用术语键和值,因为文件看起来像

key = value

以下命令列表为您提供了可能的答案:

# lines common between file1 and file2
grep -F -f file1 file2
# lines in file2 not in file1
grep -v -F -f file1 file2
# changed key values from file1 to file2
cut -d'=' -f1 file1 | grep -F -f - <(grep -v -F -f file1 file2)
# keys in file1 but not in file2
cut -d'=' -f1 file1 | grep -v -F -f - file2
# keys in file2 but not in file1
cut -d'=' -f1 file2 | grep -v -F -f - file1

或者您可以只进行一次简单的awk操作,这并不是最优化的操作,但是可以提供清晰的输出:

$ awk '
    BEGIN{FS=" *= *"}
    {key=$1;value=$2}
    (NR==FNR){a[key]=value; next}
    {b[key] = value }
    END {
       for (key in a) if (key in b) {
           print (a[key] == b[key] ? "COMM" : "DIFF"), key,"=",a[key],"<=>",b[key]
           delete a[key]
           delete b[key] 
       }
       for (key in a) {
           print "UNI1", key,"=",a[key]
       }
       for (key in b) {
           print "UNI2", key,"=",b[key]
       }
    }' file1 file2

这将产生一些类似的输出

 COMM key1 = val1 <=> val1
 COMM key2 = val2 <=> val2
 DIFF key3 = val31 <=> val32      
 COMM key4 = val4 <=> val4
 UNI1 key5 = val5
 UNI2 key6 = val6