Question

我是初学者，使用Python NLTK创建反向索引以获取信息。

我成功创建的函数 makeInvertedIndex 是将dict变量rdes_list作为输入，输出是反向索引字典。例如：

输入rdes_list = {1：＆＃39; hello world＆＃39;，2：＆＃39; hello＆＃39;，3：＆＃39; hello cat＆＃39;，4：＆＃39; hellolot of猫＆＃39;}

输出index_dict = {＆＃39;你好＆＃39;：[0,1,2]，＆＃39; cat＆＃39;：[2]，＆＃39;＆＃39;：[3]，＆＃39;世界＆＃39;：[0]，＆＃39;猫＆＃39;：[3]，＆＃39; hellolot＆＃39;：[3]}

基于以上功能，我遇到了创建其他两个功能的问题：第一个是创建 orSearch （invertedIndex，query）函数，它接受反向索引（即index_dict）和查询（即单词列表），然后返回文档编号集指定包含任何查询中的单词的所有文档。

第二个是创建和Search （invertedIndex，query）函数，它接受反向索引（即index_dict）和查询（即单词列表），然后返回集合文档编号，指定查询中包含所有单词的所有文档。

Answer 1

我提供以下解决方案：

output_index_dict = {'hello': [0, 1, 2], 'cat': [2], 'of': [3], 'world': [0], 'cats': [3], 'hellolot': [3]}

def orSearch (invertedIndex, query):
    result = []
    for key, value in invertedIndex.items():
        if key in query:
            result.append(value)
    relevant_documents = [index for indexes in result for index in indexes]
    return set(relevant_documents)

>>> orSearch(output_index_dict, ['of', 'hello', 'cat'])
output : {0, 1, 2, 3}

def andSearch (invertedIndex, query):
    result = []
    for key, value in invertedIndex.items():
        if key in query:
            result.append(value)
    common_indexes = set.intersection(*map(set,result))
    return common_indexes

>>> andSearch(output_index_dict, ['hellolot', 'of', 'cats'])
output : {3}

希望我的请求中没有遗漏任何内容。

在Python NLTK中创建函数以获取信息

1 个答案: