使用python删除文件中的重复单词

时间:2017-09-06 01:16:17

标签: python python-2.7 word

我有一个重复多个单词的文本文件。 我需要每个单词只出现一次。

按照我尝试开发的代码

import  codecs

 wordList = codecs.open('Arquivo.txt' , 'r')
 wordList2 = codecs.open('Arquivo2.txt', 'w')

for x in range(len(wordList)) :
    for y in range(x + 1, len(wordList ) ):
        if wordList[x] == wordList[y]:
            wordList2.append(wordList[x] )
        for y in wordList2:
            wordList.remove(y)

ERRO

    wordList2 = codecs.open('File2.txt', 'w').readline()
IOError: File not open for reading

1 个答案:

答案 0 :(得分:0)

也许你想尝试一下。它将使wordList成为列表而不是文件对象。使用wordList2也可以这样做。

.strip()会删除换行符。

wordList =[line.strip() for line in codecs.open('File.txt' , 'r').readlines()]

编辑:这里是我希望它适合您的完整代码

import  codecs

wordList = [line.strip() for line in codecs.open('File.txt' , 'r').readlines()]
wordList2 = [line.strip() for line in codecs.open('File2.txt', 'r').readlines()]
for x in range(len(wordList)) :
    for y in range(x + 1, len(wordList ) ):
        if wordList[x] == wordList[y]:
            wordList2.append(wordList[x])
        for y in wordList2:
            wordList.remove(y)

# assuming the code above is working
# now write your updated contents
with open('outfile1.txt','w') as outfile1:
    for word in wordList:
        outfile1.write(word + '\n')

with open('outfile2.txt','w') as outfile2:
    for word in wordList2:
        outfile2.write(word + '\n')

编辑2:如果你想使用字典而不是列表(因为字典对于查找需要O(1)时间复杂度,而不是在两个列表中进行重复的强制比较)

wordList = {line.strip():1 for line in codecs.open('File.txt' , 'r').readlines()}

其中line.strip()是您的关键,1是您的价值。要删除"您可以通过wordList[word] = 0

将其值设置为0