嗨[堆栈溢出的所有聪明人]
我正在开发一个项目,其中我有一些内容用点[。]分隔, 所以我正在尝试搜索像[word.tk]这样的内容,它用于内容中的标记系统。 这些标签存储在单独的动态字段中
{
"ts_ticker_market": "word.tk",
"ts_market": "tk",
"ts_ticker": "word",
"ds_date": "2007-07-30T21:00:00Z",
"ts_ar_search": "en word.tk tk words qnb clarifies:",
"ts_search": "en word.tk tk words qnb clarifies: ...",
"content": "en word.tk tk words qnb clarifies: ...",
},
搜索查询可能看起来像$ term是word.tk:
{!boost b=recip(ms(NOW,ds_date),3.16e-11,1,0.5)}
(
ts_ticker_market:($term)^200
OR ts_market:($term)^175
OR ts_ticker_market:($term)^125
OR ts_ticker:($term)^50
OR ts_ticker:($term*)^25
OR ts_title:($term)^4
OR ts_body:($term)^2
OR ts_search:($term)^1
)
我得到了
analyzer returned too many terms for multiTerm term: word.tk"
我希望有人可以帮我解决这个问题吗?
编辑: 根据要求提供更多信息: schema.xml分析器
<analyzer type="index">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="1"
catenateNumbers="1"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="0"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="multiterm">
<charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="stopwords.txt"
enablePositionIncrements="true" />
<filter class="solr.WordDelimiterFilterFactory"
protected="protwords.txt"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"
preserveOriginal="1"/>
<filter class="solr.LengthFilterFactory" min="2" max="100" />
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
希望这有帮助。
答案 0 :(得分:0)
为了跟进,我无法对点.
分离的东西进行搜索。最后我最终将其分解为多个字段,并为Solr中的每个字段添加不同的助推器。
我的搜索结果是这样的。笨重但它完成了工作。
ts_search:( $term )^0.01
OR ts_ticker:( $term )^0.3
OR ts_market:( $term )^0.2
OR ts_ticker_market:( $term )^0.4
OR ts_ticker:( $term*)^0.09
OR ts_market:( $term*)^0.04
OR ts_ticker_market:( $term*)^0.16
OR ts_company_name_ar:( $term )^0.01
OR ts_company_name_en:( $term )^0.01
幸运的是,文本是分裂的,每个部分都很重要。问题是Solr无法搜索,因为它返回了许多结果,所有结果都是相同的得分,0或者一些非常高的值。
这样一切都会得到更分开的分数。