我有许多包含未排序值的单列文本文件。目的是加入他们,然后加入"加入" linux的实用程序需要对文件进行排序。知道怎么做而没有排序吗?
A.TXT
0000;
0001;
0002;
0003;
B.txt
0000;
0011;
0012;
0003;
C.txt
0000;
0024;
0003;
0025;
期望的输出:
0000;
0003;
答案 0 :(得分:0)
为了克服twalberg的精细awk程序中的“文件数量”和“重复元素”问题,我会使用更详细的:
#!/usr/bin/python2
from sys import argv
# collect all lines from each file in their own set
sets = []
for path in argv[1:]:
with open(path) as infile:
s = set(infile.readlines())
sets.append(s)
# find the common items in all sets
common = sets[0]
for s in sets[1:]:
common = common.intersection(s)
# print the common items in the order they appear in the
# first file
with open(argv[1]) as infile:
for line in infile:
if line in common:
common.remove(line) # prevents duplicates
print line,
答案 1 :(得分:0)
我认为,这需要GNU awk用于多维数组:
gawk '
FNR == 1 {nfiles++}
{seen[$1][FILENAME] = 1}
END {for (item in seen) if (length(seen[item]) == nfiles) print item}
' A.txt B.txt C.txt
0000;
0003;
答案 2 :(得分:0)
TXR Lisp解决方案:
(defvar hash-list
(collect-each ((a *args*))
(hash-construct '(:equal-based) (zip (get-lines (open-file a))))))
(if hash-list
(dohash (key val [reduce-left hash-isec hash-list])
(put-line key)))
$ txr join.tl
$ txr join.tl A.txt
0000;
0001;
0002;
0003;
$ txr join.tl A.txt B.txt C.txt
0000;
0003;