循环遍历数组 - python

时间:2015-11-10 01:36:14

标签: python arrays

我目前正在编写一个程序,它将接收一个文本文件,然后计算文件中每个单词的频率,在对每个单词进行下限并删除其标点后。

这是我的代码:

import sys 
import string

incoming =[]
freq =[]
word =[]
count = 0
index = 0
i = 0

with open(sys.argv[1], "r") as word_list:
    for line in word_list:
        #word is the string of the .txt file

        #strips punctuation and lower cases each word
        for words in line.split():
            words = words.translate(string.maketrans("",""), string.punctuation)
            words = words.lower()
            incoming.append(words)
        #incoming is now an array with each element as a word from the file     

    for i in range(len(incoming)-1):
        if (incoming[i]) not in word:
            #WORD[i] = word[index]
            word[index] = incoming[i]
            freq[index] = 1
            index += 1

        else: 
            freq[index] = freq[index] + 1


    for j in word:
        print "%s %d", word[j], freq[j]

我收到错误:

  File "wordfreq.py", line 26, in <module>
    word[index] = incoming[i]
IndexError: list assignment index out of range

但我没看到它是如何超出范围的。据我所知,indexi都没有超出范围。我是Python的新手,并且在使用&#39; for&#39;时遇到了很多麻烦。循环语法。任何提示将非常感激。

2 个答案:

答案 0 :(得分:1)

在您的代码中,a-set确实不存在。你应该做的是a-set

答案 1 :(得分:1)

更好的方法可能是使用defaultdict:

>>> from collections import defaultdict
>>> d = defaultdict(int)
>>> for i in ["abc", "abc", "def"]:
...     d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'abc': 2, 'def': 1})
>>>

这是一种更频繁的计算频率的方法,而不是维护索引。单词在d.keys()中,它们的频率在d.values()