Question

我有一个句子列表和一个查询列表。这些查询具有不同的空格分隔的单词，我必须找到包含所有查询的句子并打印这些句子的索引。 示例：

3
hey how are you
how do you do
how are you doing
2
how
how are

输出：

0 1 2
0 2

输入结构如下：

sentences = ['hey how are you' , 'how do you do' , 'how are you doing']
queries = ['how', 'how are']

我一直在使用O（n ^ 3）算法，但这非常慢，并且给了我一个TLE。有没有一种更快的方法，也许是正则表达式，但是我还无法弄清楚如何构建该表达式？

输入大小限制为10 ^ 4。

我的代码：

def textQueries(sentences, queries):
def maptoDict(sentence):
    d = {}
    for word in sentence.split():
        if word not in d.keys():
            d[word] = 1
        else:
            d[word] += 1
    return d
s = list(map(maptoDict,sentences))
q = list(set(query.split()) for query in queries)
for query in q:
    res = []
    for i in range(len(s)):
        if query.issubset(set(s[i].keys())):
            res.append(i)
    if not len(res):
        res.append(-1)
    for r in res:
        print(r, end = ' ')
    print()

Answer 1

Python支持称为set的数据结构。您可以对句子进行后处理，以生成单词映射图。

就是这样的地图：

word_in_sentences["how"] = set(0, 1, 2)

使用该数据结构，您可以计算所有查询词的集合交集。这将为您提供一个包含查询中所有单词的集合，而无需考虑单词的顺序。

一旦将句子过滤到较小的组，执行任何排序搜索都应该更快。

Answer 2

我格式化了输出，因此您可以跟踪循环以查看如何检索每个项目。您可以使用此元素仅打印Employee.class.toString()，但我希望您了解如何获得所需的内容。

Employee.class.getClass()

输出

index

这将获得基本输出：

sentences = ['hey how are you', 'how do you do', 'how are you doing']
queries = ['how', 'how are']

for i, items in enumerate(sentences):
   for j in queries:
        if j in items:
            print(f"Query '{j}' is in Sentence {i}")

输出

(xenial)vash@localhost:~/python/stack_overflow$ python3.7 sent_find.py 
Query 'how' is in Sentence 0
Query 'how are' is in Sentence 0
Query 'how' is in Sentence 1
Query 'how' is in Sentence 2
Query 'how are' is in Sentence 2

Answer 3

您可以将每个字符串子数组存储在map中。 value中key中的map将是一个列表（当然是索引）。下面是伪代码

伪代码：

    Map<string,list> map
    for each_sentence in sentence_list:
        words = each_sentence.split("\\s")
           for i = 0 to words.length():
               for j=i to words.length():
                 subword = string from i to j
                 if map.containsKey(subword):
                     map.get(subword).add(each_sentence's index)
                 else:
                    map.put(subword,new list(each_sentence's index))

   for each_query in query_list:
       print map.containsKey(each_query) ? map.get(each_query) : -1

时间复杂度：O（n ^ 2），其中n是所有句子中一个句子的最大长度。

在列表列表中搜索列表的快速方法

3 个答案: