Question

我试图实现一个基本的信息检索系统：

class IRSystem:
    """A very simple Information Retrieval System. The constructor
    s = IRSystem() builds an empty system. Next, index several documents
    with s.index_document(ID, text).
    """

    def __init__(self):
        "Initialize an IR Sytem."
        self.tdf = defaultdict(set)
        self.doc_ids = []

    def index_document(self, doc_id, words):
        "Add a new unindexed document to the system."
        self.doc_ids.append(doc_id)
        for word in words:
            self.tdf[word].add(doc_id)
        return self.tdf

    def index_collection(self, filenames):
        "Index a collection of documents."
        for filename in filenames:
            self.index_document(os.path.basename(filename),
                                tokenize(open(filename).read()))

    def query(self, *terms):
        "Query the system for documents in which all terms occur."
        set_list = []
        for term in terms:
            set_list.append(self.tdf[term])
        return set.intersection(*set_list)

我的查询方法有问题。假设我收集了3个文本。 index_colletion方法，为每个文件（文本文档）填充tdf字典（单词，出现单词的文本集）值。

给定一个术语列表，查询方法首先创建一个空列表，然后对于每个术语，在初始列表中的tdf（应该是一个集合）中附加相应的值。如果我这样做：

s.index_collection(glob.glob("*.txt"))
terms = ["ora","pioggerellina","caduta"]
s.query(terms)

我得到TypeError: unhashable type: 'list'

问题在于：

set_list.append(self.tdf[term])

但我不明白为什么：self.tdf [tems]是一个集合，我只是想将这个集合附加到列表中。我错过了什么？感谢

将字典值附加到列表中，获取TypeError：unhashable类型：＆＃39; list＆＃39;

0 个答案: