在第一个字段上加入2个文件

时间:2013-07-10 12:32:40

标签: join awk gawk tomahawk nawk

我想比较两个文件file1 $ 1等于file2 $ 1并显示输出file1 $ 1,$ 2,$ 3,$ 4,$ 5,file2 $ 2,$ 5,和file1 $ 5的差异 - file2 $ 5

输入文件1.txt

1,raja,AP,NIND,14:51:56.46
2,mona,KR,SIND,12:41:46.36
3,JO,TM,SIND,18:31:56.36
4,andrew,sind,13:43:23.12
5,drew,sind,17:53:53.42

输入文件2.txt

5,raju,UP,NIND,11:51:56.46
6,NAG,KR,SIND,12:41:46.36
7,JO,TM,SIND,18:31:56.36
8,andrew,sind,kkd,14:43:23.12
4,andrew,sind,ggf,15:53:53.42
10,asJO,TM,SIND,16:31:56.36
3,sandrew,sind,gba,9:43:23.12
2,xcandrew,sind,sds,6:53:53.42
1,cv,GTM,SIND,5:31:56.36
9,mnJO,TM,SIND,2:20:56.36

输出:

1,raja,AP,NIND,14:51:56.46,cv,5:31:56.36 
2,mona,KR,SIND,12:41:46.36,xcandrew,6:53:53.42
3,JO,TM,SIND,18:31:56.36,sandrew,9:43:23.12
4,andrew,sind,13:43:23.12,andrew,15:53:53.42
5,drew,sind,17:53:53.42,raju,11:51:56.46

1 个答案:

答案 0 :(得分:2)

使用awk,您可以:

$ awk 'NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2,$5}' FS=, OFS=, f1 f2
5,drew,sind,17:53:53.42,raju,11:51:56.46
4,andrew,sind,13:43:23.12,andrew,
3,JO,TM,SIND,18:31:56.36,sandrew,
2,mona,KR,SIND,12:41:46.36,xcandrew,
1,raja,AP,NIND,14:51:56.46,cv,5:31:56.36

如果您希望输出排序,然后管道到sort

$ awk 'NR==FNR{a[$1]=$0;next}$1 in a{print a[$1],$2,$5}' FS=, OFS=, f1 f2 | sort
1,raja,AP,NIND,14:51:56.46,cv,5:31:56.36
2,mona,KR,SIND,12:41:46.36,xcandrew,
3,JO,TM,SIND,18:31:56.36,sandrew,
4,andrew,sind,13:43:23.12,andrew,
5,drew,sind,17:53:53.42,raju,11:51:56.46

替代使用join

$ join -j1 -t, -o 1.1,1.2,1.3,1.4,1.5,2.2,2.5 <(sort f1) <(sort f2)
1,raja,AP,NIND,14:51:56.46,cv,5:31:56.36
2,mona,KR,SIND,12:41:46.36,xcandrew,
3,JO,TM,SIND,18:31:56.36,sandrew,
4,andrew,sind,13:43:23.12,,andrew,
5,drew,sind,17:53:53.42,,raju,11:51:56.46