Question

我目前处于需要将两个.txt文件合并到一个文件中的情况。 txt文件是单词列表..

示例.txt文件：

file1：

A
AND
APRIL
AUGUST

file2的：

A
AND
APOSTROPHE
AREA

我想将这些文件合并到一个文件中，该文件只包含一个出现的单词条目。

结束文件应如下所示：

A
AND
APOSTROPHE
APRIL
AREA
AUGUST

当我尝试通过附加如下文件来附加文件时，我意识到我遇到了这个问题：

filenames = ['data/train/words.txt', 'data/test/words.txt']
with open('data/local/words.txt', 'w') as outfile:
    for fname in filenames:
        with open(fname) as infile:
            outfile.write(infile.read())

如何轻松完成？

Answer 1

我会使用集合，因为它们不允许重复。 |是集合的union运算符，它结合了两个集合。集合是无序的，因此最后您必须将它们转换回列表然后对它们进行排序。

file1 = open("file1.txt")
file2 = open("file2.txt")

out = open("fileOUT.txt", "w")

words = set(file1.read().split("\n")) # Create a set
words = words | set(file2.read().split("\n")) # Combine with other word list

out.write("\n".join(sorted(list(words))))

# Now close the files

out.close()
file1.close()
file2.close()

Answer 2

将两个文件读入集合并写回两者的并集：

def read_file(fname):
    with open(fname) as fobj:
        return set(entry.strip() for entry in fobj)

data1 = read_file('myfile1.txt')
data2 = read_file('myfile2.txt')

merged = data1.union(data2) 

with open('merged.txt', 'w') as fout:
    for word in sorted(merged):
        fout.write('{}\n'.format(word))

merged.txt的内容：

A
AND
APOSTROPHE
APRIL
AREA
AUGUST

Answer 3

将所有单词读入单个集合（自动删除重复项），然后将此集合写入输出文件。由于集合是无序的，我们需要在将其内容写入文件之前手动对集合进行排序。

# Add all words from the files
filenames = ['data/train/words.txt', 'data/test/words.txt']
words = set()
for fname in filenames:
    with open(fname) as infile:
        words |= set(infile.readlines())

# Sort the words
words = sorted(words)  # Now words is a list, not a set!

# Write the result to a file
with open('data/local/words.txt', 'w') as outfile:
    outfile.writelines(words)

在python中合并两个txt文件最简单的方法是什么

3 个答案: