我有一个重复多个单词的文本文件。 我需要每个单词只出现一次。
按照我尝试开发的代码
import codecs
wordList = codecs.open('Arquivo.txt' , 'r')
wordList2 = codecs.open('Arquivo2.txt', 'w')
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x] )
for y in wordList2:
wordList.remove(y)
ERRO
wordList2 = codecs.open('File2.txt', 'w').readline()
IOError: File not open for reading
答案 0 :(得分:0)
也许你想尝试一下。它将使wordList
成为列表而不是文件对象。使用wordList2也可以这样做。
.strip()
会删除换行符。
wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
编辑:这里是我希望它适合您的完整代码
import codecs
wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
for x in range(len(wordList)) :
for y in range(x + 1, len(wordList ) ):
if wordList[x] == wordList[y]:
wordList2.append(wordList[x])
for y in wordList2:
wordList.remove(y)
# assuming the code above is working
# now write your updated contents
with open('outfile1.txt','w') as outfile1:
for word in wordList:
outfile1.write(word + '\n')
with open('outfile2.txt','w') as outfile2:
for word in wordList2:
outfile2.write(word + '\n')
编辑2:如果你想使用字典而不是列表(因为字典对于查找需要O(1)时间复杂度,而不是在两个列表中进行重复的强制比较)
wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}
其中line.strip()
是您的关键,1
是您的价值。要删除"您可以通过wordList[word] = 0