Diff 2长字符串并在第3个文件中写入结果

时间:2015-06-10 14:02:16

标签: linux bash file awk

我正在处理我最初的bash脚本,并被困在一个需要论坛帮助的地方。

如何在shell脚本中实现以下内容? (任何建议/指针都赞赏!!!)

要求:

比较2个文件匹配包含长字符串的KEY并且在第3个文件中只保留其他属性不同的长字符串(比如USER的值不同)。也跳过一些属性比较。

输入FILE1 -

AAUTOX=Y;ACCT=;ACTION=C;APRICE=99.975;AQTY=5541;USER=Sam,bpl;CONFIRM=Y;KEY=29976DYE4;DEPT=MYNA-CLCD -- same
AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Todd,chr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD -- diff (USER=Todd,chr) write in result file

输入FILE2 -

AAUTOX=Y;ACCT=;ACTION=C;APRICE=99.975;AQTY=5541;USER=Sam,bpl;CONFIRM=Y;KEY=29976DYE4;DEPT=MYNA-CLCD -- same
AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Alan,ncr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD -- diff (USER=Alan,ncr) write in result file
AAUTOX=Y;ACCT=;ACTION=C;APRICE=17.000;AQTY=6453;USER=Todd,chr;CONFIRM=N;KEY=29976DYE6;DEPT=MYNA-CLCD -- no match (KEY) found write in result file

输出FILE3:

FILE1:AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Todd,chr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD 
FILE2:AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Alan,ncr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD


FILE1:

FILE2: AAUTOX=Y;ACCT=;ACTION=C;APRICE=17.000;AQTY=6453;USER=Todd,chr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD
每个不同的行

等等。

在我的脑海中接近(它是第一次切割和后来的改善):

  • 逐行读取FILE1(awk或读取??) 对于每一行
    • a)读取FILE2以匹配唯一的“KEY”(这里使用哪个命令???可以根据来自FILE2的键??? grep KEY来读取文件但是如何将行拆分为字段进行比较?)
    • b)现在将FILE1.LINE1的每个字段与FILE2.LINE进行比较,如果在第3个结果文件中写入不同(awk将行分为字段$ 1,那么可以比较2 虽然不知道怎么办,如果使用“读取”命令???)

1 个答案:

答案 0 :(得分:1)

这使用GNU awk 4. *来排序in(请参阅http://www.gnu.org/software/gawk/manual/gawk.html#Controlling-Array-Traversal),其他awks可以管道排序或以其他方式确定键顺序:

$ cat tst.awk
BEGIN { FS="[;=]" }
{
    delete name2val
    for (i=1; i<=NF; i+=2) { name2val[$i] = $(i+1) }
    key = name2val["KEY"]
    keys[key]
    recs[key,FILENAME] = $0
    for (name in name2val) { vals[key,FILENAME,name] = name2val[name] }
}
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    file1 = ARGV[1]
    file2 = ARGV[2]
    for (key in keys) {
        state = "SAME"
        if ( (key,file1) in recs ) {
            if ( (key,file2) in recs ) {
                for (name in name2val) {
                    if (name != "CONFIRM") {
                        if (vals[key,file1,name] != vals[key,file2,name]) {
                            state = "DIFF"
                        }
                    }
                }
            } else { state = "FILE1_ONLY" }
        } else { state = "FILE2_ONLY" }

        if (state != "SAME") {
            print file1":", recs[key,file1]
            print file2":", recs[key,file2]
            print ""
        }
    }
}

$ gawk -f tst.awk FILE1 FILE2
FILE1: AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Todd,chr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD -- diff (USER=Todd,c
hr) write in result file
FILE2: AAUTOX=Y;ACCT=;ACTION=C;APRICE=05.975;AQTY=3451;USER=Alan,ncr;CONFIRM=N;KEY=29976DYE5;DEPT=MYNA-CLCD -- diff (USER=Alan,ncr) write in result file

FILE1: 
FILE2: AAUTOX=Y;ACCT=;ACTION=C;APRICE=17.000;AQTY=6453;USER=Todd,chr;CONFIRM=N;KEY=29976DYE6;DEPT=MYNA-CLCD -- no match (KEY) found write in result file