如何从file1匹配file2 grep内容并将它们按file2的顺序放置

时间:2016-04-09 17:46:02

标签: unix grep

我有file1.txt内容:

rs002
rs113
rs209
rs227
rs151 
rs104

我有file2.txt内容:

rs113   113
rs002   002
rs227   227
rs209   209
rs104   104
rs151   151

我希望获得与file2.txt中的记录匹配的file1.txt行,我尝试过这些行:

grep -Fwf file1.txt file2.txt 

输出如下:

rs113   113
rs002   002
rs227   227
rs209   209
rs104   104
rs151   151

这会提取所有匹配的行,但它是file2.txt中出现的顺序。有没有办法在保持file1.txt的顺序的同时提取匹配的记录?所需的输出如下:

rs002   002
rs113   113
rs209   209
rs227   227
rs151   151
rs104   104

4 个答案:

答案 0 :(得分:2)

一个(非常优雅)解决方案是循环遍历file1.txt并查找每一行的匹配项:

while IFS= read -r line; do
    grep -wF "$line" file2.txt
done < file1.txt

给出输出

rs002   002
rs113   113
rs209   209
rs227   227
rs151   151
rs104   104

如果你知道每一行最多只出现一次,可以通过告诉grep在第一场比赛后停止来加速这一点:

grep -m 1 -wF "$line" file2.txt

据我所知,这是一个GNU扩展。

请注意,循环遍历文件以对每个循环中的另一个文件执行某些处理通常是sign that there is a much more efficient way to do things,因此这应该仅用于足够小的文件,以便提供更好的解决方案比使用此解决方案处理它们更长。

答案 1 :(得分:2)

grep这太复杂了。如果file2.txt不是很大,即它适合内存,你应该使用awk

 awk 'FNR==NR { f2[$1] = $2; next } $1 in f2 { print $1, f2[$1] }' file2.txt file1.txt

输出:

rs002 002
rs113 113
rs209 209
rs227 227
rs151 151
rs104 104

答案 2 :(得分:0)

从file2

创建一个sed命令文件
 sed 's#^\([^ ]*\)\(.*\)#/\1/ s/$/\2/#' file2 > tmp.sed
 sed -f tmp.sed file1

这两行可以合并,避免使用tmp文件

sed -f <(sed 's#^\([^ ]*\)\(.*\)#/\1/ s/$/\2/#' file2) file1

答案 3 :(得分:-1)

这应该有所帮助(但对于大输入不会是最佳的):

2016-04-09 20:35:19,399 [JobControl] INFO  org.apache.hadoop.mapreduce.JobSubmitter - Cleaning up the staging area /user/ambari-qa/.staging/job_1460043791266_0012
2016-04-09 20:35:19,407 [JobControl] INFO  org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob - PigLatin:pigSmoke.sh got an error while submitting 
java.io.IOException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1460043791266_0012 to YARN : User: rm/sandbox.hortonworks.com@HDP-SANDBOX is not allowed to impersonate ambari-qa
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:306)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:240)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapreduce.lib.jobcontrol.ControlledJob.submit(ControlledJob.java:335)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.pig.backend.hadoop23.PigJobControl.submit(PigJobControl.java:128)
    at org.apache.pig.backend.hadoop23.PigJobControl.run(PigJobControl.java:194)
    at java.lang.Thread.run(Thread.java:745)
    at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$1.run(MapReduceLauncher.java:276)
Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application_1460043791266_0012 to YARN : User: rm/sandbox.hortonworks.com@HDP-SANDBOX is not allowed to impersonate ambari-qa
    at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.submitApplication(YarnClientImpl.java:271)
    at org.apache.hadoop.mapred.ResourceMgrDelegate.submitApplication(ResourceMgrDelegate.java:291)
    at org.apache.hadoop.mapred.YARNRunner.submitJob(YARNRunner.java:290)
    ... 16 more
2016-04-09 20:35:19,410 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1460043791266_0012
2016-04-09 20:35:19,410 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B
2016-04-09 20:35:19,410 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: