加入多个未排序的文本文件

时间:2014-04-29 18:17:03

标签: linux text-processing

我有许多包含未排序值的单列文本文件。目的是加入他们,然后加入"加入" linux的实用程序需要对文件进行排序。知道怎么做而没有排序吗?

A.TXT

0000;
0001;
0002;
0003;

B.txt

0000;
0011;
0012;
0003;

C.txt

0000;
0024;
0003;
0025;

期望的输出:

0000;
0003;

3 个答案:

答案 0 :(得分:0)

为了克服twalberg的精细awk程序中的“文件数量”和“重复元素”问题,我会使用更详细的:

#!/usr/bin/python2

from sys import argv

# collect all lines from each file in their own set

sets = []
for path in argv[1:]:
    with open(path) as infile:
        s = set(infile.readlines())
        sets.append(s)

# find the common items in all sets

common = sets[0]
for s in sets[1:]:
    common = common.intersection(s)

# print the common items in the order they appear in the
# first file

with open(argv[1]) as infile:
    for line in infile:
        if line in common:
            common.remove(line) # prevents duplicates
            print line,

答案 1 :(得分:0)

我认为,这需要GNU awk用于多维数组:

gawk '
    FNR == 1 {nfiles++}
    {seen[$1][FILENAME] = 1} 
    END {for (item in seen) if (length(seen[item]) == nfiles) print item}
' A.txt B.txt C.txt
0000;
0003;

答案 2 :(得分:0)

TXR Lisp解决方案:

(defvar hash-list
  (collect-each ((a *args*))
    (hash-construct '(:equal-based) (zip (get-lines (open-file a))))))

(if hash-list
  (dohash (key val [reduce-left hash-isec hash-list])
    (put-line key)))

$ txr join.tl
$ txr join.tl A.txt
0000;
0001;
0002;
0003;
$ txr join.tl A.txt B.txt C.txt
0000;
0003;