TypeError:列表索引必须是整数或切片,而不是使用嵌套词典时的str

时间:2017-11-11 03:45:19

标签: python python-3.x tf-idf

我使用嵌套字典创建本地存储的文本文件的倒排索引。倒排索引的抽象结构如下(值是整数)。在键的任何单词值' 0'是关键' 1'的idf和价值。是tf。

inverted_index={'word1':{'0':idf_value, '1': 2 , 'filename1': frequency_value, 'filename2': frequency_value},'word2':{'0':idf_value, '1': 2, 'filename1': frequency_value, 'filename2': frequency_value}}

这是代码:

import textract, math, os
docs=[]
#Read the files and store them in docs
folder = os.listdir("./input/")
for file in folder:
    if file.endswith("txt"):
        docs.append ([file,textract.process("./input/"+file)])

inverted_index={}
for doc in docs:
    words=doc[1].decode()
    words=words.split(" ")

    #loop through and build the inverted index
    for word in words:
        temp={}
        #to remove initial white space
        if (word == " ") or (word==""):
            continue
        if word not in inverted_index:
            temp[doc[0]]=1
            temp['0']=0 #idf
            temp['1']=1 #tf
            inverted_index[word]=temp
        else:
            if doc[0] not in inverted_index[word].keys():
                inverted_index[word][doc[0]]=1
                inverted_index[word]['1']=inverted_index[word]['1']+1
            else:
                inverted_index[word][doc[0]]=inverted_index[word][doc[0]]+1

# to sort and print values with calculating the the tf and idf on the fly
for key, value in sorted(inverted_index.items()): # to sort words alphabitically
    inverted_index[key]=sorted(inverted_index[key]) # to sort the filenames where the word occured.
    inverted_index[key]['0']=math.log2(len(docs)/value['1']) # the error in this line
    print(key, value)

但是我在最后一行收到了这个错误:

Traceback (most recent call last):
  File "aaaa.py", line 34, in <module>
    inverted_index[key]['0']=math.log2(len(docs)/value['1']) 
TypeError: list indices must be integers or slices, not str

你能帮我解决一下这个bug。谢谢

2 个答案:

答案 0 :(得分:0)

错误来自inverted_index[key]['0'],因为inverted_index[key] = sorted(inverted_index[key])创建了一个内部字典键列表,以及

print(inverted_index[key])
# becomes ['0', '1', 'filename1', 'filename2']

因此触发TypeError,因为您无法对列表进行字符串索引。

为了让您更改内部词典中的每个单词[&#39; 0&#39;]值,您可以尝试以下代码:

for key, value in sorted(inverted_index.items()):
    inverted_index[key] = sorted(inverted_index[key])
    current_word_key = inverted_index[key][0]
    value['0'] = 'some_value'
    inverted_index[key] = value  

print(inverted_index)

DEMO

答案 1 :(得分:0)

这对我有用

for key, value in sorted(inverted_index.items()):
    inverted_index[key]=sorted(inverted_index[key])
    value['0']=math.log2(len(docs)/value['1']) # the error in this line
    inverted_index[key]=value
    print(key, value)