Question

我有100个文本文件，每个文件包含一列。文件如下：

file1.txt
10032
19873
18326

file2.txt
10032
19873
11254

file3.txt
15478
10032
11254

等等。每个文件的大小不同。请告诉我如何找到所有这100个文件中常见的数字。

The same number appear only once in 1 file.

Answer 1

无论相同的数字是否可以在1个文件中多次出现，这都会有效：

$ awk '{a[$0][ARGIND]} END{for (i in a) if (length(a[i])==ARGIND) print i}' file[123]
10032

上面使用GNU awk来表示真正的多维数组和ARGIND。如有必要，可以轻松调整其他问题，例如：

$ awk '!seen[$0,FILENAME]++{a[$0]++} END{for (i in a) if (a[i]==ARGC-1) print i}' file[123]
10032

如果每个文件中的数字都是唯一的，那么您只需要：

$ awk '(++c[$0])==(ARGC-1)' file*
10032

Answer 2

awk救援！

在所有文件中查找公共元素（假设同一文件中的唯一性）

awk '{a[$1]++} END{for(k in a) if(a[k]==ARGC-1) print k}' files

计算所有出现次数并打印count等于文件数的值。

Answer 3

包含一个列的文件？

您可以使用shell对文件进行排序和比较：

for f in file*.txt; do sort $f|uniq; done|sort|uniq -c -d

最后-c不是必需的，只有在您想要计算出现次数时才需要。

Answer 4

一个使用Bash和comm，因为我需要知道它是否可行。我的测试文件为1，2和3，因此for f in ?：

f=$(shuf -n1 -e ?)                     # pick one file randomly for initial comms file

sort "$f" > comms 

for f in ?                             # this time for all files
do 
  comm -1 -2 <(sort "$f") comms > tmp  # comms should be in sorted order always
  # grep -Fxf "$f" comms > tmp         # another solution, thanks @Sundeep
  mv tmp comms
done

在包含单个列值的多个文件中查找公共值

4 个答案: