我目前正在编写一个程序,它将接收一个文本文件,然后计算文件中每个单词的频率,在对每个单词进行下限并删除其标点后。
这是我的代码:
import sys
import string
incoming =[]
freq =[]
word =[]
count = 0
index = 0
i = 0
with open(sys.argv[1], "r") as word_list:
for line in word_list:
#word is the string of the .txt file
#strips punctuation and lower cases each word
for words in line.split():
words = words.translate(string.maketrans("",""), string.punctuation)
words = words.lower()
incoming.append(words)
#incoming is now an array with each element as a word from the file
for i in range(len(incoming)-1):
if (incoming[i]) not in word:
#WORD[i] = word[index]
word[index] = incoming[i]
freq[index] = 1
index += 1
else:
freq[index] = freq[index] + 1
for j in word:
print "%s %d", word[j], freq[j]
我收到错误:
File "wordfreq.py", line 26, in <module>
word[index] = incoming[i]
IndexError: list assignment index out of range
但我没看到它是如何超出范围的。据我所知,index
和i
都没有超出范围。我是Python的新手,并且在使用&#39; for&#39;时遇到了很多麻烦。循环语法。任何提示将非常感激。
答案 0 :(得分:1)
在您的代码中,a-set
确实不存在。你应该做的是a-set
。
答案 1 :(得分:1)
更好的方法可能是使用defaultdict:
>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ["abc", "abc", "def"]:
... d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'abc': 2, 'def': 1})
>>>
这是一种更频繁的计算频率的方法,而不是维护索引。单词在d.keys()中,它们的频率在d.values()
中