我正在尝试根据程序从给定网站检索到的语料库填充反向索引。我得到
AttributeError: 'dict' object has no attribute 'encode'
但是我不明白为什么,据我了解,我正确地使用字典遍历倒排索引并进行填充,因此我需要帮助!以下是错误产生的代码,我相信这应该是很小的变化,但我可能是错的... 代码:
def add(self, doc):
for token in self.dict:
token = token.encode("utf-8")
if token in doc["title"].encode("utf-8") or token in doc["text"].encode("utf-8"):
if doc["docId"] not in self.index[token]:
self.index[token].append(doc["docId"])
self.documents[doc["docId"]] = doc
def create_index(self):
for doc in self.corpus:
self.add(doc)
这是语料库格式的示例:
{
"docId": 169,
"title": "CSI 7901 Études dirigées / Directed Studies (3 crédits / 3 units)",
"text": "Ce cours est équivalent à COMP 6901 à la Carleton University. / This course is equivalent to COMP 6901 at Carleton University."
},
编辑: 这是我们当前所在的类(反向索引):
class Index:
def __init__(self):
self.index = defaultdict(list)
self.documents = {}
self.__unique_id = 0
with open("C:\Users\judyc\OneDrive\Documents\GitHub\MatteosMind\src\output\corpus.json",'rb') as dict_file:
self.dict = json.load(dict_file)
with open("C:\Users\judyc\OneDrive\Documents\GitHub\MatteosMind\src\output\corpus.json") as corpus_file:
self.corpus = json.load(corpus_file)
此处要求的是完整的错误消息:
Traceback (most recent call last):
File "./src/main.py", line 39, in <module>
main()
File "./src/main.py", line 28, in main
index.create_index()
File ".\src\invertedindex.py", line 28, in create_index
self.add(doc)
File ".\src\invertedindex.py", line 20, in add
token = token.encode("utf-8")
AttributeError: 'dict' object has no attribute 'encode'