Question

我有一个文件，我想分成多个文件，第一列的uniq值。例如，这是一个文件：

fileA.txt

1    Cat
1    Dog
1    Frog
2    Boy
2    Girl
3    Tree
3    Leaf
3    Branch
3    Trunk

我希望我的输出看起来像这样：

file1.txt

1    Cat
2    Boy
3    Tree

FILE2.TXT

1    Dog
2    Girl
3    Leaf

file3.txt

1    Frog
3    Branch

file4.txt

3    Trunk

如果某个值不存在，我希望跳过它。我试图寻找与我类似的情况，但我已经做空了。有谁知道如何做到这一点？

理论上，这个awk命令应该有效：awk '{print > "file" ++a[$1] ".txt"}' input。但是，我无法让它正常工作（很可能是因为我在Mac上工作）有没有人知道另一种方式？

Answer 1

输出重定向右侧的无表达式表达式是未定义的行为。试试awk '{print > ("file" ++a[$1] ".txt")}' input。

如果同时打开太多文件是一个问题，那么获取GNU awk，但如果你不能：

$ ls
 fileA.txt

$ awk '{f="file" ++a[$1] ".txt"; print >> f; close(f)}' fileA.txt

$ ls
file1.txt  file2.txt  file3.txt  file4.txt  fileA.txt

$ cat file1.txt
1    Cat
2    Boy
3    Tree

Answer 2

这是Python的解决方案：

from collections import Counter
fd_dict = {}
ind_counter = Counter()

with open('fileA.txt') as inf:
    for line in inf:
        ind, _ = line.split()
        ind_counter[ind] += 1
        file_ind = ind_counter[ind]
        fd = (
            fd_dict[file_ind] if file_ind in fd_dict else
            fd_dict.setdefault(
                file_ind, 
                open('file{}.txt'.format(file_ind), 'w')))
        fd.write(line)

for fd in fd_dict.itervalues():
    fd.close()

根据特定列中的值拆分值

2 个答案: