我一起实施短语和关键字搜索(很可能这种搜索有一个名称,但我不知道)。举例来说,搜索我喜欢海龟应该匹配:
I like turtles
He said I like turtles
I really like turtles
I really like those reptiles called turtles
Turtles is what I like
简而言之,字符串必须包含要匹配的所有关键字。
然后是排序搜索结果的问题。
天真地,我假设匹配最接近结果的开头和原始查询,结果越好。我该如何表达这段代码?
我的第一种方法是根据原始查询中的关键字与预期位置的接近程度,为每个结果中的每个关键字指定一个分数。在伪代码中:
score(result,query) {
keywords = query.split(" ");
score = 0
for i to keywords.length() {
score += score(result,query,keywords,i)
}
return score
}
score(result,query,keywords,i) {
index = text.indexOf(keywords[i])
if (i == 0) return index;
previousIndex = text.indexOf(keywords[i-1])
indexInSearch = query.indexOf(keywords[i])
previousIndexInSearch = query.indexOf(keywords[i-1])
expectedIndex = previousIndex + (indexInSearch - previousIndexInSearch)
return abs(index - expectedIndex)
}
分数越低,结果越好。上述例子的分数看起来不错:
I like turtles = 0
I really like turtles = 7
He said I like turtles = 8
I really like those reptiles called turtles = 38
Turtles is what I like = 39
这是一种排序搜索结果的可行方法吗?
除了任何类型的语义分析外,还有什么可以考虑改进它?