我有两个包含多个列和行的大文件。这两个文件都包含TAG列,它在一个文件中没有重复项,而在另一个文件中有重复项。
看起来像这样:
FILE1:
stat stat P-value tag
0.3049 7.464 1.875e-11 L2_None_chr1_-_109092036
0.2961 7.448 2.105e-11 L2_None_chr1_-_109092036
0.2934 7.347 3.389e-11 L2_None_chr1_-_109092036
0.2961 7.245 5.668e-11 L2_None_chr1_-_109092036
0.6682 7.284 4.664e-11 L2_None_chr1_-_109957962
0.6682 7.284 4.664e-11 L2_None_chr1_-_109957962
0.3933 7.363 3.127e-11 L2_None_chr1_-_159842839
0.3808 7.284 4.672e-11 L2_None_chr1_-_159842839
0.2993 7.17 8.278e-11 L2_None_chr1_-_169972458
0.3312 7.817 3.075e-12 L2_None_chr1_-_203626998
0.3312 7.817 3.075e-12 L2_None_chr1_-_203626998
0.614 7.616 9.742e-12 L2_None_chr1_-_569826
0.6411 7.58 1.037e-11 L2_None_chr1_-_569826
0.5755 7.275 4.871e-11 L2_None_chr1_-_569826
0.6893 7.26 5.255e-11 L2_None_chr1_-_6546011
0.3136 7.529 1.35e-11 L2_None_chr1_-_91180355
0.3262 7.449 2.023e-11 L2_None_chr1_-_91180355
0.298 7.151 9.129e-11 L2_None_chr1_-_91180355
0.2999 7.149 9.201e-11 L2_None_chr1_-_91182695
0.5383 7.189 7.534e-11 L2_None_chr1_-_91183491
FILE2:
L2_None_chr1_-_109092036 chr1 109092034
L2_None_chr1_-_109957962 chr1 109957879
L2_None_chr1_-_159842839 chr1 159842779
L2_None_chr1_-_169972458 chr1 169972444
L2_None_chr1_-_203626998 chr1 203626983
L2_None_chr1_-_569826 chr1 569802
L2_None_chr1_-_6546011 chr1 6545930
L2_None_chr1_-_91180355 chr1 91180310
L2_None_chr1_-_91182695 chr1 91182572
L2_None_chr1_-_91183491 chr1 91183389
我想要的东西;
stat P-value tag tag chr bp
7.464 1.875e-11 L2_None_chr1_-_109092036 L2_None_chr1_-_109092036 1 109092036
7.448 2.105e-11 L2_None_chr1_-_109092036 L2_None_chr1_-_109092036 1 109092036
7.347 3.389e-11 L2_None_chr1_-_109092036 L2_None_chr1_-_109092036 1 109092036
7.245 5.668e-11 L2_None_chr1_-_109092036 L2_None_chr1_-_109092036 1 109092036
7.284 4.664e-11 L2_None_chr1_-_109957962 L2_None_chr1_-_109957962 1 109957962
7.284 4.664e-11 L2_None_chr1_-_109957962 L2_None_chr1_-_109957962 1 109957962
7.363 3.127e-11 L2_None_chr1_-_159842839 L2_None_chr1_-_159842839 1 159842839
7.284 4.672e-11 L2_None_chr1_-_159842839 L2_None_chr1_-_159842839 1 159842839
7.17 8.278e-11 L2_None_chr1_-_169972458 L2_None_chr1_-_169972458 1 169972458
7.817 3.075e-12 L2_None_chr1_-_203626998 L2_None_chr1_-_203626998 1 203626998
7.817 3.075e-12 L2_None_chr1_-_203626998 L2_None_chr1_-_203626998 1 203626998
7.616 9.742e-12 L2_None_chr1_-_569826 L2_None_chr1_-_569826 1 569826
7.58 1.037e-11 L2_None_chr1_-_569826 L2_None_chr1_-_569826 1 569826
7.275 4.871e-11 L2_None_chr1_-_569826 L2_None_chr1_-_569826 1 569826
7.26 5.255e-11 L2_None_chr1_-_6546011 L2_None_chr1_-_6546011 1 6546011
7.529 1.35e-11 L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1 91180355
7.449 2.023e-11 L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1 91180355
7.151 9.129e-11 L2_None_chr1_-_91180355 L2_None_chr1_-_91180355 1 91180355
7.149 9.201e-11 L2_None_chr1_-_91182695 L2_None_chr1_-_91182695 1 91182695
7.189 7.534e-11 L2_None_chr1_-_91183491 L2_None_chr1_-_91183491 1 91183491
我在R中尝试了函数match
,但这并没有完全帮助我......
答案 0 :(得分:2)
这应该点缀它:
merge(dat,dat1,by.x='tag',by.y='tag')
tag stat stat.1 P.value V2 V3
1 L2_None_chr1_-_109092036 0.3049 7.464 1.875e-11 chr1 109092034
2 L2_None_chr1_-_109092036 0.2961 7.448 2.105e-11 chr1 109092034
3 L2_None_chr1_-_109092036 0.2934 7.347 3.389e-11 chr1 109092034
4 L2_None_chr1_-_109092036 0.2961 7.245 5.668e-11 chr1 109092034
5 L2_None_chr1_-_109957962 0.6682 7.284 4.664e-11 chr1 109957879
6 L2_None_chr1_-_109957962 0.6682 7.284 4.664e-11 chr1 109957879
7 L2_None_chr1_-_159842839 0.3933 7.363 3.127e-11 chr1 159842779
8 L2_None_chr1_-_159842839 0.3808 7.284 4.672e-11 chr1 159842779
9 L2_None_chr1_-_169972458 0.2993 7.170 8.278e-11 chr1 169972444
10 L2_None_chr1_-_203626998 0.3312 7.817 3.075e-12 chr1 203626983
11 L2_None_chr1_-_203626998 0.3312 7.817 3.075e-12 chr1 203626983
12 L2_None_chr1_-_569826 0.6140 7.616 9.742e-12 chr1 569802
13 L2_None_chr1_-_569826 0.6411 7.580 1.037e-11 chr1 569802
14 L2_None_chr1_-_569826 0.5755 7.275 4.871e-11 chr1 569802
15 L2_None_chr1_-_6546011 0.6893 7.260 5.255e-11 chr1 6545930
16 L2_None_chr1_-_91180355 0.3136 7.529 1.350e-11 chr1 91180310
17 L2_None_chr1_-_91180355 0.3262 7.449 2.023e-11 chr1 91180310
18 L2_None_chr1_-_91180355 0.2980 7.151 9.129e-11 chr1 91180310
19 L2_None_chr1_-_91182695 0.2999 7.149 9.201e-11 chr1 91182572
20 L2_None_chr1_-_91183491 0.5383 7.189 7.534e-11 chr1 91183389
答案 1 :(得分:1)
您可能正在寻找linux join
命令。 man join
是一个开始,你的命令就像这样
join -1 4 -2 1 <(sort FILE1) <(sort FILE2)
-1
和-2
指定将用于匹配的相应文件中的字段。如果文件已经排序,则不需要sort
。