填充倒排索引时出现AttributeError

时间:2019-04-07 20:32:16

标签: json python-2.7 dictionary inverted-index

我正在尝试根据程序从给定网站检索到的语料库填充反向索引。我得到

AttributeError: 'dict' object has no attribute 'encode'

但是我不明白为什么,据我了解,我正确地使用字典遍历倒排索引并进行填充,因此我需要帮助!以下是错误产生的代码,我相信这应该是很小的变化,但我可能是错的... 代码:

def add(self, doc):
    for token in self.dict:
        token = token.encode("utf-8")
        if token in doc["title"].encode("utf-8") or token in doc["text"].encode("utf-8"):
            if doc["docId"] not in self.index[token]:
                self.index[token].append(doc["docId"])
                self.documents[doc["docId"]] = doc
def create_index(self):
     for doc in self.corpus:
         self.add(doc)

这是语料库格式的示例:

    {
        "docId": 169,
        "title": "CSI 7901 Études dirigées / Directed Studies (3 crédits / 3 units)",
        "text": "Ce cours est équivalent à COMP 6901 à la Carleton University. / This course is equivalent to COMP 6901 at Carleton University."
    },

编辑: 这是我们当前所在的类(反向索引):

class Index:
    def __init__(self):

        self.index = defaultdict(list)
        self.documents = {}
        self.__unique_id = 0
        with open("C:\Users\judyc\OneDrive\Documents\GitHub\MatteosMind\src\output\corpus.json",'rb') as dict_file:
            self.dict = json.load(dict_file)
        with open("C:\Users\judyc\OneDrive\Documents\GitHub\MatteosMind\src\output\corpus.json") as corpus_file:
            self.corpus = json.load(corpus_file)

此处要求的是完整的错误消息:

Traceback (most recent call last):
  File "./src/main.py", line 39, in <module>
    main()
  File "./src/main.py", line 28, in main
    index.create_index()
  File ".\src\invertedindex.py", line 28, in create_index
    self.add(doc)
  File ".\src\invertedindex.py", line 20, in add
    token = token.encode("utf-8")
AttributeError: 'dict' object has no attribute 'encode'

0 个答案:

没有答案