我目前正致力于使用SOLR进行人员搜索工具,以便使用各种过滤器(如SynonymFilterFactory,WordDelimiterFactory等)和禁用TF-IDF来跨多个字段(使用edismax)进行索引+模糊搜索。
除了搜索词多次匹配的少数情况外,这种方法效果很好。例如,搜索" Martin XXXX"返回"马文马丁"作为最高的结果,因为它匹配马丁与#34;马文"和#34; Martin"。
通常,将搜索字词与文档中的多个字词进行匹配非常有意义。但是,在人物搜索的情况下,我希望只添加每个搜索词的最高分(即,将每个搜索词映射到文档中的一个词(人名/信息)) )。
SOLR / Lucene中是否有一种机制允许我强制在搜索词与匹配词之间进行一对一映射?
您可以在查询的调试中看到以下问题:
0.27641854 = (MATCH) sum of:
0.27641854 = (MATCH) sum of:
0.15077375 = (MATCH) weight(FullName:martin in 118169) [NoTFIDFSimilarityClass], result of:
0.15077375 = score(doc=118169,freq=1.0 = termFreq=1.0
), product of:
0.15077375 = queryWeight, product of:
1.0 = idf(docFreq=1619, maxDocs=328317)
0.15077375 = queryNorm
1.0 = fieldWeight in 118169, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=1619, maxDocs=328317)
1.0 = fieldNorm(doc=118169)
0.12564479 = (MATCH) weight(FullName:marvin^0.8333333 in 118169) [NoTFIDFSimilarityClass], result of:
0.12564479 = score(doc=118169,freq=1.0 = termFreq=1.0
), product of:
0.12564479 = queryWeight, product of:
0.8333333 = boost
1.0 = idf(docFreq=105, maxDocs=328317)
0.15077375 = queryNorm
1.0 = fieldWeight in 118169, product of:
1.0 = tf(freq=1.0), with freq of:
1.0 = termFreq=1.0
1.0 = idf(docFreq=105, maxDocs=328317)
1.0 = fieldNorm(doc=118169)
查询例如,
http://domain/solr/peoplefinder/select?q=Martin~&wt=json&indent=true&defType=edismax&qf=FullName&stopwords=true&lowercaseOperators=true&debug=true