TypeError:不可用类型:' list' - 创建频率功能

时间:2016-12-09 19:21:40

标签: list function python-3.x dictionary methods

我正在将文本文件作为输入并创建一个函数来计算最常出现的单词。如果最常出现2个或更多单词并且相等,我将打印所有这些单词。

def wordOccurance(userFile):

    userFile.seek(0)
    line = userFile.readline()
    lines = []
    while line != "":
        if line != "\n":
            line = line.lower() # making lower case
            line = line.rstrip("\n") # cleaning
            line = line.rstrip("?") #cleans the whole docoument by removing "?" 
            line = line.rstrip("!") #cleans the whole docoument by removing "!"
            line = line.rstrip(".") #cleans the whole docoument by removing "."
            line = line.split(" ") #splits the texts into space
            lines.append(line)
        line = userFile.readline() # keep reading lines from document.

    words = lines

    wordDict = {} #creates the clean word Dic, from above 
    for word in words: #
        if word in wordDict.keys():
            wordDict[word] = wordDict[word] + 1
        else:
           wordDict[word] = 1

    largest_value = max(wordDict.values())

    for k in wordDict.keys():
        if wordDict[k] == largest_value:
            print(k)

    return wordDict

请帮我这个功能。

1 个答案:

答案 0 :(得分:0)

在这一行中,您将创建一个字符串列表:

line = line.split(" ") #splits the texts into space

然后将其附加到列表中,这样就有了一个列表列表:

lines.append(line)

稍后您遍历该列表列表,并尝试使用子列表作为键:

for word in words: #
    if word in wordDict.keys():
        wordDict[word] = wordDict[word] + 1
    else:
       wordDict[word] = 1  # Here you will try to assign a list (`word`) as a key, which is not allowed

一个简单的解决方法是首先展平列表列表:

words = [item for sublist in lines for item in sublist]

for word in words: #
    if word in wordDict.keys():
        wordDict[word] = wordDict[word] + 1
    else:
       wordDict[word] = 1

list comprehension [item for sublist in lines for item in sublist]将遍历lines,然后循环显示line.split(" ")创建的子列表,并返回包含每个子列表中的项目的新列表。对你而言,lines可能看起来像这样:

[['words', 'on', 'line', 'one'], ['words', 'on', 'line', 'two']]

列表理解将把它变成这个:

['words', 'on', 'line', 'one', 'words', 'on', 'line', 'two']

如果你想使用一些不那么复杂的东西,你可以使用嵌套循环:

    # words = lines
    # just use `lines` in your for loop instead of creating an identical list 

    wordDict = {} #creates the clean word Dic, from above 
    for line in lines:
        for word in line:
            if word in wordDict.keys():
                wordDict[word] = wordDict[word] + 1
            else:
                wordDict[word] = 1

    largest_value = max(wordDict.values())

这可能会有点效率低下和/或“Pythonic”,但它可能会更容易包裹你。

此外,您可能需要考虑在清理数据之前将每一行拆分为单词,因为如果先清除行,则只会删除行末而不是单词末尾的标点符号。但是,根据数据的性质,这可能不是必需的。