Question

我正在尝试根据从文本文件中获取的所有单词列表来列出唯一单词。我唯一的问题是用于迭代两个列表的算法。

def getUniqueWords(allWords):
    uniqueWords = []
    uniqueWords.append(allWords[0])
    for i in range(len(allWords)):
        for j in range(len(uniqueWords)):
            if allWords[i] == uniqueWords[j]:
                pass
            else:
                uniqueWords.append(allWords[i])
                print uniqueWords[j]
    print uniqueWords
    return uniqueWords

你可以看到我做了一个空列表并开始迭代这两个列表。此外，我附加了列表中的第一项，因为由于某种原因，它不会尝试匹配我假设的单词，因为在空列表中，list [0]不存在。如果有人可以帮我弄清楚如何正确地迭代这个，这样我就能生成一个很棒的单词列表。

打印uniqueWords [j]只是为了调试，所以我可以看到在处理列表时出现的内容

Answer 1

我不是蟒蛇专家，但认为这应该有效：

uniqueWords = [] 
for i in allWords:
      if not i in uniqueWords:
          uniqueWords.append(i);

return uniqueWords

修改

我测试过它有效，它只返回列表中的唯一单词：

def getUniqueWords(allWords) : uniqueWords = [] for i in allWords: if not i in uniqueWords: uniqueWords.append(i) return uniqueWords print getUniqueWords(['a','b','c','a','b']);

['a'，'b'，'c']

Answer 2

我不喜欢（试图）要求你选择糟糕算法的作业问题。更好的选择是使用set或trie为例。

您可以通过2次小修改来修复程序

def getUniqueWords(allWords):
    uniqueWords = []
    uniqueWords.append(allWords[0])
    for i in range(len(allWords)):
        for j in range(len(uniqueWords)):
            if allWords[i] == uniqueWords[j]:
                break
        else:
            uniqueWords.append(allWords[i])
            print uniqueWords[j]
    print uniqueWords
    return uniqueWords

首先，当你看到单词已经存在时，你需要停止循环

        for j in range(len(uniqueWords)):
            if allWords[i] == uniqueWords[j]:
                break  # break out of the loop since you found a match

第二个是使用for / else结构而不是if / else

        for j in range(len(uniqueWords)):
            if allWords[i] == uniqueWords[j]:
                break
        else:
            uniqueWords.append(allWords[i])
            print uniqueWords[j]

Answer 3

可能你可以使用collections.Counter类？（特别是如果您还想计算源文档中每个单词出现的次数）。

http://docs.python.org/2/library/collections.html?highlight=counter#collections.Counter

import collections.Counter
def getUniqueWords(allWords):
    uniqueWords = Counter()

    for word in allWords:
        uniqueWords[word]+=1
    return uniqueWords.keys()

另一方面，如果您只想计算单词，只需使用一组：

def getUniqueWords(allWords):
    uniqueWords =set()

    for word in allWords:
        uniqueWords.add(word)
    return uniquewords #if you want to return them as a set
    OR
    return list(uniquewords) #if you want to return a list

如果你被限制在循环中，并且输入相对较大，那么循环+二分搜索是一个比循环更好的选择 - 类似的东西：

def getUniqueWords(allWords):
   uw = []
   for word in allWords:
       (lo,hi) = (0,len(uw)-1)
       m = -1
       while hi>=lo and m==-1:
           mid = lo + (hi-lo)/2
           if uw[mid]==word:
              m = mid
           elif uw[mid]<word:
              lo = mid+1
           else:
              hi = mid-1
       if m==-1:
           m = lo
           uw = uw[:m]+[word]+uw[m:]
   return uw

如果您的输入有大约100000个单词，那么使用它和简单循环之间的区别在于您的PC在执行程序时不会产生噪音：）

Answer 4

您可以使用set获得唯一的单词：

def getUniqueWords(allWords) :
    uniqueWords = list({i for i in allWords})
    return uniqueWords

print getUniqueWords(['a','b','c','a','b']);

结果： ['c'，'a'，'b']

使用循环从列表中查找所有唯一单词

4 个答案: