2个滤镜之间的比较和打印所需的结果使用AWK

时间:2017-12-22 10:44:09

标签: bash shell awk compare

我的文件如下:

文件1:

COL1|COL2|COL3|COL4|COL5

'SR'|'2017-09-01 00:19:13'|'+05:30'|'1A3LA7015L5S'|'5042449536906016501541'

'SR'|'2017-09-01 00:19:20'|'+05:30'|'1A3LA7015L6I'|'5042449603146028701550'

'SR'|'2017-09-01 00:19:23'|'+05:30'|'1A3LA7015L6I'|'5042449603146028701555'

文件2:

COL1|COL2|COL3|COL4|COL5

'SR'|'2017-09-01 00:19:13'|'+05:30'|'1A3LA7015L5Q'|'5042449536906016501541'

'SR'|'2017-09-01 00:19:20'|'+05:30'|'1A3LA7015L6I'|'5042449603146028701550'

'SR'|'2017-09-01 00:19:20'|'+05:30'|'1A3LA7015L6I'|'5042449603146028701555'

此处主键是我的第5列,表示它始终存储在变量$var

我想要的输出如下:

PrimaryKey|Column|File1Value|File2Value

'5042449536906016501541'|COL4|'1A3LA7015L5S'|'1A3LA7015L5Q'
'5042449603146028701555'|COL2|'2017-09-01 00:19:23'|'2017-09-01 00:19:20'

我尝试使用以下代码:

paste -d '|' File1 File2 | awk -F '|' -v col="pk1" \
    '{c=NF/2;for(i=1;i<=c;i++)if($i!=$(i+c))printf "%s|%s|%s|%s \n",$(i+$col+1),$i,$(i+c),$i-$(i+c)}'

但是,这没有按预期工作。

1 个答案:

答案 0 :(得分:1)

使用GNU AWK这样的脚本可能会起作用:

<强> script.awk

BEGIN     { # setup file separator and sorting:
            FS=OFS="|" 
            PROCINFO["sorted_in"]="@ind_str_asc"
          }

# skip header lines
FNR == 1  { next }

# store first file
(FNR==NR) { f1[$5]=$0
            # skip processing of other rules and 
            # read the next line from input
            next
          }

# store second file
          { f2[$5]=$0
            if( ! ($5 in f1)) {
                f1[$5] = ""
            }
          }

# compare and print stored information
END       { print"PrimaryKey", "Column", "File1Value", "File2Value" 
            for( k in f1) {
                split( f1[k], arr1, "|")
                split( f2[k], arr2, "|")
                for( c = 1; c <= length( f1[ k ] ); c++ ) {
                    if( arr1[c] != arr2[c] ) {
                        print k, "COL" c, arr1[c], arr2[c] 
                    }
                }
            }
          }

您可以运行如下命令:awk -f script.awk yourfile1 yourfile2