对于版本4.10.4搜索的solr,我创建了文件synonyms.txt并应用synonymFilterFactory,如下所示:
<fieldType name="text_general" class="solr.TextField">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="0" splitOnCaseChange="0" splitOnNumerics="0" catenateWords="1" catenateNumbers="0" catenateAll="0" preserveOriginal="1" stemEnglishPossessive="0"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
synonyms.txt具有以下内容:
holland* => holland
holland, netherland, netherlands, niederlande
我在应用程序中有一些条件生成术语:
holland*
在这种情况下,我希望显示的结果与我设置holland
,netherland
,netherlands
,niederlande
时的结果相同。
但目前,对于学期荷兰*,它没有给出匹配的结果。 荷兰*的结果包含与术语“荷兰”相同的结果。或者&#39; netherland&#39;但那些是在底部,那么我们能否提升这些结果呢?
有没有人有任何想法,我怎么能实现呢?
以下是更多细节:
在荷兰的情况下,我得到一些结果,当我调试查询时,它显示为
"debug": {
"rawquerystring": "holland",
"querystring": "holland",
"parsedquery": "(name:holland name:netherland name:netherlands name:niederlande)/no_coord",
"parsedquery_toString": "name:holland name:netherland name:netherlands name:niederlande",
"explain": {
"country-NLD-de": "\n7.42217 = (MATCH) sum of:\n 7.42217 = (MATCH) weight(name:niederlande in 1775593) [DefaultSimilarity], result of:\n 7.42217 = score(doc=1775593,freq=1.0), product of:\n 0.5213204 = queryWeight, product of:\n 14.237252 = idf(docFreq=14, maxDocs=8413113)\n 0.036616646 = queryNorm\n 14.237252 = fieldWeight in 1775593, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 14.237252 = idf(docFreq=14, maxDocs=8413113)\n 1.0 = fieldNorm(doc=1775593)\n",
"country-NLD-en": "\n7.3550315 = (MATCH) sum of:\n 7.3550315 = (MATCH) weight(name:netherlands in 230095) [DefaultSimilarity], result of:\n 7.3550315 = score(doc=230095,freq=1.0), product of:\n 0.5189572 = queryWeight, product of:\n 14.172713 = idf(docFreq=15, maxDocs=8413113)\n 0.036616646 = queryNorm\n 14.172713 = fieldWeight in 230095, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 14.172713 = idf(docFreq=15, maxDocs=8413113)\n 1.0 = fieldNorm(doc=230095)\n",
"place-49218-de": "\n5.0385056 = (MATCH) sum of:\n 5.0385056 = (MATCH) weight(name:holland in 385574) [DefaultSimilarity], result of:\n 5.0385056 = score(doc=385574,freq=1.0), product of:\n 0.4295267 = queryWeight, product of:\n 11.730367 = idf(docFreq=183, maxDocs=8413113)\n 0.036616646 = queryNorm\n 11.730367 = fieldWeight in 385574, product of:\n 1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 11.730367 = idf(docFreq=183, maxDocs=8413113)\n 1.0 = fieldNorm(doc=385574)\n",
对于荷兰*,结果包含来自荷兰的一些记录,但调试部分如下:
"debug": {
"rawquerystring": "holland*",
"querystring": "holland*",
"parsedquery": "name:holland*",
"parsedquery_toString": "name:holland*",
"explain": {
"place-51432-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n 1.0 = boost\n 1.0 = queryNorm\n",
"place-49196-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n 1.0 = boost\n 1.0 = queryNorm\n",
"place-49207-de": "\n1.0 = (MATCH) ConstantScore(name:holland name:hollandarod name:hollande name:hollander name:hollanderei name:hollandia name:hollandischer name:hollands name:hollandsbjerg name:hollandsch name:hollandsche name:hollandscheveld name:hollandsdiep name:hollandskamp name:hollandske), product of:\n 1.0 = boost\n 1.0 = queryNorm\n",
在上面的dubug部分中,如果我们检查&#34; parsedquery&#34;部分,在荷兰和荷兰的情况下它是不同的*。 所以我认为,特殊字符*不适用于SynonymFilterFactory。
答案 0 :(得分:0)
据我所知,同义词文件不支持通配符。
在您的情况下,您可能还会遇到另一个问题,因为查询中的通配符通常用于搜索不完全匹配的结果。 这取决于您的查询中使用的查询解析器。
换句话说,查询“holland*”搜索所有具有以“holland”开头的词条的文档。
如果您希望 Solr 将通配符视为一个简单的字符,您应该对其进行转义。
我看到的另一个错误,在您的字段定义中,您应该为两种情况(索引和查询)定义 analyzer
类型。
如果您为字段类型提供一个简单的定义,如 上面的例子,那么它将用于索引和 查询。
答案 1 :(得分:-1)
请尝试:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymGraphFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" tokenizerFactory="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>