查找两个文件之间的公共行

时间:2014-07-26 10:44:12

标签: linux sorting join grep

文件1:

6
9219045
71608707
105853666
106000373
106000464
106000814
106001204
106001483
106002054

文件2:

6,rO0ABXNyADljb20uYW1hem9uLnBvaW50c3BsYXRmb3JtLnV0aWwuUG9pbnRzUGxhdGZvcm1DcnlwdE1lc3NhZ2Xio1+sC+m4CAIABFsACGNpcGhlcklWdAACW0JbAApjaXBoZXJUZXh0cQB+AAFMAAxtYXRlcmlhbE5hbWV0ABJMamF2YS9sYW5nL1N0cmluZztMAA5tYXRlcmlhbFNlcmlhbHQAEExqYXZhL2xhbmcvTG9uZzt4cHVyAAJbQqzzF/gGCFTgAgAAeHAAAAAQufMrUK+8A4e0iJV4ktLQgXVxAH4ABQAAAEBNoyuUZLYRLaBqLvsvzHxxv63pO+4UPsRqpp/oHURcBdT6NES2G5H6+Kc3yjZOXDIIhHN1efAxyM/iWD0qDev9dAAwY29tLmFtYXpvbi5wb2ludHMuZW5jcnlwdGlvbi5rZXkuYWNjb3VudHNzZXJ2aWNlc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAB,jp-points
55555,rO0ABXNyADljb20uYW1hem9uLnBvaW50c3BsYXRmb3JtLnV0aWwuUG9pbnRzUGxhdGZvcm1DcnlwdE1lc3NhZ2Xio1+sC+m4CAIABFsACGNpcGhlcklWdAACW0JbAApjaXBoZXJUZXh0cQB+AAFMAAxtYXRlcmlhbE5hbWV0ABJMamF2YS9sYW5nL1N0cmluZztMAA5tYXRlcmlhbFNlcmlhbHQAEExqYXZhL2xhbmcvTG9uZzt4cHVyAAJbQqzzF/gGCFTgAgAAeHAAAAAQ5C9LG75v8+ENmmteRa/bBHVxAH4ABQAAAFBgXjgKk6KvTg4FiPfWF/7Ittzk/MpmlBecYkc9Bc+3mAV7R58rcl1hGkFdk3MagFXjUsunbE0qcV+Gy+DwhUWpBYDpA3p9q9oO8zwDJfFqCHQAMGNvbS5hbWF6b24ucG9pbnRzLmVuY3J5cHRpb24ua2V5LmFjY291bnRzc2VydmljZXNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAAAAQ==,jp-points
74292,rO0ABXNyADljb20uYW1hem9uLnBvaW50c3BsYXRmb3JtLnV0aWwuUG9pbnRzUGxhdGZvcm1DcnlwdE1lc3NhZ2Xio1+sC+m4CAIABFsACGNpcGhlcklWdAACW0JbAApjaXBoZXJUZXh0cQB+AAFMAAxtYXRlcmlhbE5hbWV0ABJMamF2YS9sYW5nL1N0cmluZztMAA5tYXRlcmlhbFNlcmlhbHQAEExqYXZhL2xhbmcvTG9uZzt4cHVyAAJbQqzzF/gGCFTgAgAAeHAAAAAQPxjL0KWZoaYxWY7clP57tnVxAH4ABQAAAFB6WiMY05SU2WiYqaC7CzwMP2kQ51ec9mkIPh7R4fz2LPwfT8VNpAwH0QLM3I497D2JLfK13S6S90dxpU1ny2VBwaU4imxVchwo7YrcvwvEZXQAMGNvbS5hbWF6b24ucG9pbnRzLmVuY3J5cHRpb24ua2V5LmFjY291bnRzc2VydmljZXNyAA5qYXZhLmxhbmcuTG9uZzuL5JDMjyPfAgABSgAFdmFsdWV4cgAQamF2YS5sYW5nLk51bWJlcoaslR0LlOCLAgAAeHAAAAAAAAAAAQ==,jp-points

文件1只有一列,我使用命令sort -n file1

对文件进行排序

文件2有三列,我使用命令sort -t "," -k 1n,1 file2对文件进行排序,该命令基于ist列进行排序。

现在,我想在file2中找到从file1

中的行开始的行

我尝试过的命令:

grep -w -f file1 file2

join -t "," -1 1 -2 1 -o 2.2 file1 file2

但是,我没有得到理想的结果。请为我提供替代方法。文件1有行7124458,文件2有行42987432。

2 个答案:

答案 0 :(得分:1)

使用awk:

awk -F, 'FNR == NR { ++a[$0]; next } $1 in a' file1 file2

输出:

6,rO0ABXNyADljb20uYW1hem9uLnBvaW50c3BsYXRmb3JtLnV0aWwuUG9pbnRzUGxhdGZvcm1DcnlwdE1lc3NhZ2Xio1+sC+m4CAIABFsACGNpcGhlcklWdAACW0JbAApjaXBoZXJUZXh0cQB+AAFMAAxtYXRlcmlhbE5hbWV0ABJMamF2YS9sYW5nL1N0cmluZztMAA5tYXRlcmlhbFNlcmlhbHQAEExqYXZhL2xhbmcvTG9uZzt4cHVyAAJbQqzzF/gGCFTgAgAAeHAAAAAQufMrUK+8A4e0iJV4ktLQgXVxAH4ABQAAAEBNoyuUZLYRLaBqLvsvzHxxv63pO+4UPsRqpp/oHURcBdT6NES2G5H6+Kc3yjZOXDIIhHN1efAxyM/iWD0qDev9dAAwY29tLmFtYXpvbi5wb2ludHMuZW5jcnlwdGlvbi5rZXkuYWNjb3VudHNzZXJ2aWNlc3IADmphdmEubGFuZy5Mb25nO4vkkMyPI98CAAFKAAV2YWx1ZXhyABBqYXZhLmxhbmcuTnVtYmVyhqyVHQuU4IsCAAB4cAAAAAAAAAAB,jp-point

答案 1 :(得分:0)

join(1)假设两个文件在连接字段上按字母顺序排序。尝试在不使用-n的情况下对输入进行排序。

(更确切地说,它取决于LC_COLLATE设置。如果您正在为两个程序相互通信的好处进行排序,那么为连接和排序设置LC_ALL = C可能更可靠,以避免任何故障由于区域设置。)