Question

我有两个TXT文件，1.txt有11,000个IP，2.txt有100万个IP。我想将 1.txt 与 2.txt （100万个IP）相匹配，然后获得匹配的。

#1.txt
1,1.1.1.1
2,2.2.2.2
3,3.3.3.3
.........

#2.txt
51.51.6.10
12.10.25.16
1.3.50.55
0.0.0.0
6.6.6.6
1.1.1.1
2.2.2.2
5.5.5.5
6.6.6.6
7.7.7.7
20.200.100.30
Like wise 1 Million lines of IPs.......

Matching Result :
1,1.1.1.1
2,2.2.2.2

我尝试过做awk -F, 'NR==FNR{a[$0];next}($2 in a)' 2.txt 1.txt，它给出了较小子集（Test Runs）的确切答案。但是检查原始文件11,000对1百万IP，它返回1.txt中的所有IP。
尝试sed -n -f <(sed 's|.*|/,&$/p|' 2.txt) 1.txt，进程自动被杀死。
尝试，comm -23 1.txt 2.txt > 3.txt，再次从1.txt返回所有IP。

不确定使用sed，awk，comm或者任何问题，我在哪里犯错误/匹配100万个IP是不可能的？有人可以帮我建议问题是什么吗？

Reference Used : http://stackoverflow.com/questions/4366533/remove-lines-from-file-which-appear-in-another-file

Answer 1

假设＃1 ：文件按原始问题中的显示进行排序

假设＃2 ：IP地址是唯一的

如果您只想要IP地址：

$ comm -12 <(cut -d, -f2 1.txt) 2.txt 
1.1.1.1
2.2.2.2

如果你想要1.txt中的整行：

$ comm -12 <(cut -d, -f2 1.txt) 2.txt  | while read ip ; do grep $ip 1.txt ; done
1,1.1.1.1
2,2.2.2.2

<强>更新

如果我的假设＃1无效，那么你必须在线排序1.txt和2.txt。

这是获得通用IP地址的解决方案：

$ comm -12 <(cut -d, -f2 1.txt |sort) <(sort 2.txt) 
1.1.1.1
2.2.2.2

这将显示1.txt的完整行：

$ comm -12 <(cut -d, -f2 1.txt |sort) <(sort 2.txt) | while read ip ; do grep $ip 1.txt ; done
1,1.1.1.1
2,2.2.2.2

我还在我的小型MacBook Air上使用1.txt中的1ML IP和2.txt中的0.5ML IP进行了快速测试。如果必须对文件进行排序，则需要19秒。

BASH ISSUE：比较两个不同的较大集文本文件并获取匹配的IP地址

1 个答案: