我目前处于需要将两个.txt文件合并到一个文件中的情况。 txt文件是单词列表..
示例.txt文件:
file1:
A
AND
APRIL
AUGUST
file2的:
A
AND
APOSTROPHE
AREA
我想将这些文件合并到一个文件中,该文件只包含一个出现的单词条目。
结束文件应如下所示:
A
AND
APOSTROPHE
APRIL
AREA
AUGUST
当我尝试通过附加如下文件来附加文件时,我意识到我遇到了这个问题:
filenames = ['data/train/words.txt', 'data/test/words.txt']
with open('data/local/words.txt', 'w') as outfile:
for fname in filenames:
with open(fname) as infile:
outfile.write(infile.read())
如何轻松完成?
答案 0 :(得分:2)
我会使用集合,因为它们不允许重复。 |
是集合的union运算符,它结合了两个集合。集合是无序的,因此最后您必须将它们转换回列表然后对它们进行排序。
file1 = open("file1.txt")
file2 = open("file2.txt")
out = open("fileOUT.txt", "w")
words = set(file1.read().split("\n")) # Create a set
words = words | set(file2.read().split("\n")) # Combine with other word list
out.write("\n".join(sorted(list(words))))
# Now close the files
out.close()
file1.close()
file2.close()
答案 1 :(得分:2)
将两个文件读入集合并写回两者的并集:
def read_file(fname):
with open(fname) as fobj:
return set(entry.strip() for entry in fobj)
data1 = read_file('myfile1.txt')
data2 = read_file('myfile2.txt')
merged = data1.union(data2)
with open('merged.txt', 'w') as fout:
for word in sorted(merged):
fout.write('{}\n'.format(word))
merged.txt
的内容:
A
AND
APOSTROPHE
APRIL
AREA
AUGUST
答案 2 :(得分:1)
将所有单词读入单个集合(自动删除重复项),然后将此集合写入输出文件。由于集合是无序的,我们需要在将其内容写入文件之前手动对集合进行排序。
# Add all words from the files
filenames = ['data/train/words.txt', 'data/test/words.txt']
words = set()
for fname in filenames:
with open(fname) as infile:
words |= set(infile.readlines())
# Sort the words
words = sorted(words) # Now words is a list, not a set!
# Write the result to a file
with open('data/local/words.txt', 'w') as outfile:
outfile.writelines(words)