Question

我正在尝试比较目录中的1100个文件。我想比较line.split()之后每个文件的“第一列”中的值，并将输出文件与常用值（作为第一列）和我来自哪里的文件的名称进行比较（作为下一个常见文件）这个值存在的地方），如下：

common-value    file-name-1   file-name-2 ..... file-name-n

我用glob()函数读取了所有文件，但在那之后，我几乎是空白的。有人可以建议一个简单的解决方案吗？

Answer 1

您可以使用dict，其键是＆＃34; first col＆＃34;项目和值是找到它们的文件列表。浏览文件时，请更新dict。如上所述，这可以通过一些python技巧加速

import collections

# a dictionary that autocreates an empty list as value for each new key
common = collections.defaultdict(list)

for fn in glob('someglob'):
    with open(fn) as fp:
        # use set to get list of unique column 1 values then iterate to add
        # to common accumulator
        for col1val in set(line.split()[0] for line in fp):
            common[col1val].append(fn)

# rebuild accumulator, discarding col1vals from only 1 file
common = {col1val:files for col1val, files in common.items() if len(files) > 1}

for col1val, files in common.items():
    print(col1val, " ".join(files))

比较多个文件以获得常用值

1 个答案: