将两个文件与两列进行比较,并从shell脚本中删除第二个文件中的重复行

时间:2014-12-11 06:52:57

标签: shell

请在我的问题中找到以下示例:

FILE1.TXT:

A|6359454|102951|FAR|976391300|12/02/2014 12:00:00 AM|2|12/02/2014 09:43:42 AM
B|6353591|102952|HEN|42217A106|11/30/2014 12:00:00 AM|10|12/02/2014 12:25:16 AM
A|6358494|102952|HEN|42217A106|12/02/2014 12:00:00 AM|10|12/02/2014 02:04:23 PM
A|6358496|102983|NAI|63633D104|12/02/2014 12:00:00 AM|6|12/02/2014 12:59:04 PM
B|6347496|102999|ACB|69360B107|11/28/2014 12:00:00 AM|1|12/02/2014 05:59:23 AM
A|6359347|102999|ACB|69360B107|12/02/2014 12:00:00 AM|2|12/02/2014 05:59:23 AM
C|6337344|103010|OAC|22002T108|11/25/2014 12:00:00 AM|10|12/01/2014 08:48:01 AM

FILE2.TXT:

B|6359454|102951|FAR|976391300|12/02/2014 12:00:00 AM|2|12/02/2014 09:43:42 AM
B|6353591|102952|HEN|42217A106|11/30/2014 12:00:00 AM|10|12/02/2014 12:25:16 AM
A|6358494|102952|HEN|42217A106|12/02/2014 12:00:00 AM|10|12/02/2014 02:04:23 PM
C|6337344|103010|OAC|22002T108|11/25/2014 12:00:00 AM|10|12/01/2014 08:48:01 AM
A|6358496|102983|NAI|63633D104|12/02/2014 12:00:00 AM|6|12/02/2014 12:59:04 PM
B|6353613|103061|SAT|875465106|11/30/2014 12:00:00 AM|7|12/01/2014 07:22:18 PM
A|6355261|103061|SAT|875465106|12/01/2014 12:00:00 AM|7|12/01/2014 07:22:18 PM
B|6347496|102999|ACB|69360B107|11/28/2014 12:00:00 AM|1|12/02/2014 05:59:23 AM
A|6358506|103060|PQS|737464107|12/02/2014 12:00:00 AM|9|12/02/2014 04:24:43 AM
C|6337352|103065|OAI|681936100|11/25/2014 12:00:00 AM|6|11/26/2014 04:30:42 AM
C|6359347|102999|ACB|69360B107|12/02/2014 12:00:00 AM|2|12/02/2014 05:59:23 AM

期望的输出:

File3.txt:

B|6353613|103061|SAT|875465106|11/30/2014 12:00:00 AM|7|12/01/2014 07:22:18 PM
A|6355261|103061|SAT|875465106|12/01/2014 12:00:00 AM|7|12/01/2014 07:22:18 PM
A|6358506|103060|PQS|737464107|12/02/2014 12:00:00 AM|9|12/02/2014 04:24:43 AM
C|6337352|103065|OAI|681936100|11/25/2014 12:00:00 AM|6|11/26/2014 04:30:42 AM

我想使用第4列,第6列比较file1和file2,并从file2中删除整个行,它们在文件1中匹配。我还想将结果保存到第3个文件。

我试过

awk 'FNR==NR{a[$4,$6];next};!(a[$4,$6])' file1.txt file2.txt > file3.txt

但输出正在打印2个文件

提前致谢

2 个答案:

答案 0 :(得分:0)

一种方法:

perl -we ' my %seen;
           open my $fh1, "<", $ARGV[0];
           while (<$fh1>) {
               my @fields = split /[|]/, $_;
               ++$seen{"$fields[3]|$fields[5]"};
           }
           close $fh1;
           open my $fh2, "<", $ARGV[1];
           while (<$fh2>) {
               my @fields = split /[|]/, $_;
               print unless $seen{"$fields[3]|$fields[5]"};
           }
' File1.txt File2.txt > File3.txt

答案 1 :(得分:0)

我使用下面的命令并获得正确的输出。

awk -F'|' 'NR == FNR {a [$ 4,$ 6] ++; next}!(a [$ 4,$ 6])'file1.txt file2.txt&gt; file3.txt

由于