Question

我目前正在编写一个程序，它将接收一个文本文件，然后计算文件中每个单词的频率，在对每个单词进行下限并删除其标点后。

这是我的代码：

import sys 
import string

incoming =[]
freq =[]
word =[]
count = 0
index = 0
i = 0

with open(sys.argv[1], "r") as word_list:
    for line in word_list:
        #word is the string of the .txt file

        #strips punctuation and lower cases each word
        for words in line.split():
            words = words.translate(string.maketrans("",""), string.punctuation)
            words = words.lower()
            incoming.append(words)
        #incoming is now an array with each element as a word from the file     

    for i in range(len(incoming)-1):
        if (incoming[i]) not in word:
            #WORD[i] = word[index]
            word[index] = incoming[i]
            freq[index] = 1
            index += 1

        else: 
            freq[index] = freq[index] + 1


    for j in word:
        print "%s %d", word[j], freq[j]

我收到错误：

  File "wordfreq.py", line 26, in <module>
    word[index] = incoming[i]
IndexError: list assignment index out of range

但我没看到它是如何超出范围的。据我所知，index和i都没有超出范围。我是Python的新手，并且在使用＆＃39; for＆＃39;时遇到了很多麻烦。循环语法。任何提示将非常感激。

Answer 1

在您的代码中，a-set确实不存在。你应该做的是a-set。

Answer 2

更好的方法可能是使用defaultdict：

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ["abc", "abc", "def"]:
...     d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'abc': 2, 'def': 1})
>>>

这是一种更频繁的计算频率的方法，而不是维护索引。单词在d.keys（）中，它们的频率在d.values（）

中

循环遍历数组 - python

2 个答案: