根据模拟过程中的uniq数量,我有多个已排序的文件。例: 文件1(第三列是126位长):
12018647
290704 Instr1: 000000000000000000000000000000001010000111000010101001110000000000100001100101111011000000000000000000000000000000000000000001
276277 Instr1: 000000000000000000000000001100011110000000000111101000011000000000100000110110100101000000000000000000000000000000000000000001
248268 Instr1: 000000000001111111111111110100001110000000000000101000011000000000100001100101110010000000000000000000000000000000000000000001
230387 Instr1: 000001010111111111111111100100000000000101000100100110100000000000100001100101110011000000000000000000000000000000000000000001
229445 Instr1: 000000000000000000000000000000001010001011000000101000010000000000100001100101111001000000000000000000000000000000000000000001
224885 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
218722 Instr1: 000000100110000000000000000100001110000000000110100100000000000000100000110110100011000000000000000000000000000000000000000001
216637 Instr1: 000000000000000000111100000100001010000000000000010100010000000000100001100101110101000000000000000000000000000000000000000001
211294 Instr1: 000000000000000000000000000000001010001111000110101011101000000000100001100101101111010000000000000000000000000000000000000001
201754 Instr1: 000000000000000000000000000000011010001001000000101000010000000000100001100101111010000000000000000000000000000000000000000001
199568 Instr1: 000001010111000110111100100100000000001001011100100110100000000000100001100101111000010000000000000000000000000000000000000001
192394 Instr1: 000000110111000110111100100100001010000000011100100101000000000000100001100101111111010000000000000000000000000000000000000001
156719 Instr1: 000001010111000110111100000100000000001011011100100110100000000000100001100101110100000000000000000000000000000000000000000001
154935 Instr1: 000000110111000110111011000100010110000000011100100101000000000000100001100101110001000000000000000000000000000000000000000001
152440 Instr1: 000000110111111111111111100100001010000000000011100101000000000000100001100101111101100000000000000000000000000000000000000001
150409 Instr1: 000000110111000110111100100100001110000000011100100101000000000000100001100101110111010000000000000000000000000000000000000001
142168 Instr1: 000000110111000110111010100100011010000000011100100101000000000000100001100101101110010000000000000000000000000000000000000001
127784 Instr1: 000001010110000000000000000100000000000101000110100110100000000000100000010101000110010000000000000000000000000000000000000001
126609 Instr1: 000000110110000000000000100100001010000000000011100101000000000000100000010101001000110000000000000000000000000000000000000001
107861 Instr1: 000000000000000000000000000000011010000101000000101000010000000000100000010101000101010000000000000000000000000000000000000001
97748 Instr1: 000000110110000000000000100101001010000000010010100101000000000000100000010101000111010000000000000000000000000000000000000001
96644 Instr1: 000000100110000000000000000100001010000000000110100100000000000000100000110110100100000000000000000000000000000000000000000001
89944 Instr1: 000000110111000110011110000100001010000000011100100101000000000000100000110111010101000000000000000000000000000000000000000001
84330 Instr1: 000000000000000000011111111100001010000000000010101001111000000000100001100111111100000000000000000000000000000000000000000001
81039 Instr1: 000000000000000000000001100100010010000000000000101000011000000000100000010101000100110000000000000000000000000000000000000001
77980 Instr1: 000000100110000000000000001100001010000000010001100100000000000000100000010110010000000000000000000000000000000000000000000001
76378 Instr1: 000000110110000000000000100101000010000000000100100101000000000000100000010111010010000000000000000000000000000000000000000001
68031 Instr1: 000000110111000110011110100100001110000000011100100101000000000000100000110111010010100000000000000000000000000000000000000001
67762 Instr1: 000000000000000000000000000000010010100001000000101000010000000000100000010111010010110000000000000000000000000000000000000001
66508 Instr1: 000001010110000000000000000100000000000001000100100110100000000000100000110110111110000000000000000000000000000000000000000001
59293 Instr1: 000000000000000000000000000000010010100001000000101000010000000000100000010101010001110000000000000000000000000000000000000001
57900 Instr1: 000000110110000000000000100101000010000000000100100101000000000000100000010101010001000000000000000000000000000000000000000001
56217 Instr1: 000000110111000000011100000100001010000000011100100101000000000000100001011001110000110000000000000000000000000000000000000001
56113 Instr1: 000000000000000000000011000100001010000000000010101011001000000000100001010010101101110000000000000000000000000000000000000001
同样,我有File2(第三列126位长):
3367689
2267317 Instr1: 000000000000000000000000000000001010000101001000101000101000000000100000000100101001000000000000000000000000000000000000000001
395148 Instr1: 000000000000000000000000000000001010000101011110101011011000000000100000000100101000000000000000000000000000000000000000000001
393699 Instr1: 000000110110000000000110100100010110000000010000100101000000000000100000000100101111100000000000000000000000000000000000000001
283811 Instr1: 000000110110000000000000000101000010000000000101100101000000000000100000000100100111000000000000000000000000000000000000000001
4961 Instr1: 000001010111111111111110100100000000010101000101100110100000000000100000000011111000010000000000000000000000000000000000000001
3350 Instr1: 000001010111111111111111000100000000000101000011100110100000000000100000000011110111010000000000000000000000000000000000000001
1975 Instr1: 000000110111111111111100000100001010000000000101100101000000000000100000000011110100010000000000000000000000000000000000000001
1928 Instr1: 000000110111111111111110000100001010000000000101100101000000000000100000000011110110010000000000000000000000000000000000000001
1833 Instr1: 000000110111111111111100100100001010000000000101100101000000000000100000000011110101010000000000000000000000000000000000000001
1725 Instr1: 000000000000000000000011111100001010000000001000101010111000000000100000000011110010010000000000000000000000000000000000000001
1575 Instr1: 000000000000000000000000000000010110001001000010101000010000000000100000000011110011010000000000000000000000000000000000000001
1487 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
584 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100110000000000000000000000000000000000000000001
495 Instr1: 000000000000000000000000001101011110000000010111101000011000000000100000000100101100110000000000000000000000000000000000000001
481 Instr1: 000000000000000000000001000101110110000000011101101000011000000000100000000011111001000000000000000000000000000000000000000001
452 Instr1: 000001010110000000000010000100000000010001011101100110100000000000100000000100101100000000000000000000000000000000000000000001
376 Instr1: 000000110110000000001000000100100010000000011101100101000000000000100000000100101010000000000000000000000000000000000000000001
342 Instr1: 000000000000000000000000000000010110101111000000101000010000000000100000000100101011000000000000000000000000000000000000000001
339 Instr1: 000001010110000000000010100100000000010101000010100110100000000000100000000011110001000000000000000000000000000000000000000001
339 Instr1: 000000000001111111111111000101110110000000011101101000011000000000100000000011101111000000000000000000000000000000000000000001
339 Instr1: 000000000000000000000000101100001010000000001001101010101000000000100000000011110011000000000000000000000000000000000000000001
339 Instr1: 000000000000000000000000101100001010000000000101101010101000000000100000000011110000000000000000000000000000000000000000000001
339 Instr1: 000000000000000000000000001100110010000000000000101000011000000000100000000011110010000000000000000000000000000000000000000001
325 Instr1: 000000110110000000000101100100001010000000010000100101000000000000100000000100101000100000000000000000000000000000000000000001
325 Instr1: 000000000000000000000000000000001110010001000010101000010000000000100000000100101001100000000000000000000000000000000000000001
257 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100111000000000000000000000000000000000000000001
120 Instr1: 000001010111111111111110000100000000010101000101100110100000000000100000000011111000000000000000000000000000000000000000000001
120 Instr1: 000001010111111111111110000100000000000101000011100110100000000000100000000011110110000000000000000000000000000000000000000001
120 Instr1: 000001010111111111111100000100000000000101000011100110100000000000100000000011110101000000000000000000000000000000000000000001
120 Instr1: 000000000000000000000000000000100010010011000000101000010000000000100000000011110111000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100101000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100100000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100011000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100010000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100001000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000100000000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011111000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011110000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011101000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011100000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011011000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011010000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011001000000000000000000000000000000000000000001
84 Instr1: 000000000000000000000000000000000000000000000000000000000000000000000000000000011000000000000000000000000000000000000000000001
文件不一定具有相同的行数(行)。现在,我想比较这两个文件,并找出它们之间是否有任何共同的第三列和每个文件中第1列的相应数字:
Example Output(randomly doing):
FileA FileB Data
290704 283811 000000000001111111111111110100001110000000000000101000011000000000100001100101110010000000000000000000000000000000000000000001
我已经使用follwoing命令生成了这些文件:
sort result.txt | uniq -c | sort -nr > File1.txt
现在我不确定如何找到共性。 unix“comm”对我不起作用。我想我可能需要使用“awk”或Python。但欢迎任何建议。
PS:这不是硬问题
答案 0 :(得分:2)
在awk中。这是一个awk经典,足以学习语言的理由,是通向更好shell的途径:
onTouch()
说明:
Square
编辑:如果您有多个文件且可能有多次点击:
首先,更多测试数据(三个文件中的每个文件中有一个唯一记录,两个文件中有一个中继记录,三个文件中有一个):
$ awk 'NR==FNR{a[$3]=$1;next}$3 in a{print $1, a[$3], $3}' f1 f2
1487 224885 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
然后代码(这不是经典,不。这只是一个致敬):
NR==FNR{ # process first file (the smaller)
a[$3]=$1 # hash to a using $3 as key
next # skip to next record
}
$3 in a{ # when a match is found processing the second file
print $1, a[$3], $3 # output in desired order
}
' f1 f2 # smaller file first as it is hashed to memory
请注意,所有yor数据都将存储到内存中,因此需要足够的内存。
答案 1 :(得分:0)
我会使用sqlite数据库来解决这个问题,它非常容易学习,一旦你掌握了基础知识,它将解决你将面临的其他方法遇到的许多问题
下载sqlite浏览器即可参加Coursera或Udacity的在线课程
对于您的问题,它可以像
一样简单CREATE TABLE newtable AS SELECT column1.file1
FROM column3.file1
JOIN column3.file2
ON column3.file1=column3.file2