Question

我想打开一个文件并逐行阅读。对于每一行，我想使用split（）方法将行拆分为单词列表。然后我想检查每一行上的每个单词，看看该单词是否已经在列表中，如果没有将其附加到列表中。这是我写的代码。

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = list()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        if stuff not in stuff:
            line1.append(stuff)
print line1

我的问题是，当我打印出line1时，它会以这样的格式打印出大约30个重复列表。

['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], 
['But', 'soft', 'what', 'light', 'through', 'yonder', 'window', 'breaks'], ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun'], 
    ['It', 'is', 'the', 'east', 'and', 'Juliet', 'is', 'the', 'sun']
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'], 
    ['Arise', 'fair', 'sun', 'and', 'kill', 'the', 'envious', 'moon'],

我想知道为什么会出现这个问题，以及如何删除重复的单词和列表。

Answer 1

你有if stuff not in stuff。如果您将该行更改为if word not in line1:，并将下一行更改为line1.append(word)，则代码应该有效。

或者，使用集合。

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    for word in stuff:
        line1.add(word)
print line1

甚至

fname = raw_input("Enter file name: ")
fh = open(fname)
line1 = set()
for line in fh:
    stuff = line.rstrip().split()
    line1 = line1.union(set(stuff))
print line1

集合只包含唯一值（尽管它们没有排序或索引的概念），因此您不需要处理检查单词是否已经出现：set数据类型会自动处理。

附加在python中

1 个答案: