我正在使用solr并将字段标记为如下:
<field name="Title" type="text_general" multiValued="false" indexed="true" stored="true">
<analyzer>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</field>
我在每个搜索字段附加*以获得匹配结果: 标题:应用程序* 例如app *会给我app,应用程序和类似的结果
但是如果我在其中搜索带有“ - ”的术语,则查询无法返回任何内容。 例如:
标题:孩子玩* 不返回任何结果 但标题:儿童游戏确实!!
任何人都可以指出我可能是什么问题。
调试后我得到了这个: 标题:儿童游戏
"debug":{
"rawquerystring":"Title:child-play",
"querystring":"Title::child-play",
"parsedquery":"Title::child Title::play",
"parsedquery_toString":"Title::child Title::play",
标题:儿童游戏*
"debug":{
"rawquerystring":"CompanyName:child-play*",
"querystring":"CompanyName:child-play*",
"parsedquery":"CompanyName:child-play*",
"parsedquery_toString":"CompanyName:child-play*",
答案 0 :(得分:2)
我建议您使用 WordDelimiterFilterFactory
只需将字段类型更改为“自定义类型”,在我的情况下,它是'text_general'
<field name="Title" type="text_general"/>
然后你需要创建一个新类型
例如,我的设置。您可以根据需要自定义它。
<fieldType name="text_general" class="solr.TextField" omitNorms="false" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" types="wdfftypes.txt" generateNumberParts="0" stemEnglishPossessive="0" splitOnCaseChange="1" preserveOriginal="1" catenateAll="1" catenateWords="1" catenateNumbers="1" generateWordParts="1" splitOnNumerics="1"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" types="wdfftypes.txt" generateNumberParts="1" stemEnglishPossessive="0" splitOnCaseChange="1" preserveOriginal="1" catenateAll="1" catenateWords="1" catenateNumbers="1" generateWordParts="1" splitOnNumerics="1"/>
</analyzer>
</fieldType>
请在此处阅读更多信息
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
参数:
generateWordParts: (integer, default 1) If non-zero, splits words at delimiters.
For example:"CamelCase", "hot-spot" -> "Camel", "Case", "hot", "spot"
generateNumberParts: (integer, default 1) If non-zero, splits numeric strings at delimiters:"1947-32" ->"1947", "32"
splitOnCaseChange: (integer, default 1) If 0, words are not split on camel-case changes:"BugBlaster-XL" -> "BugBlaster", "XL". Example 1 below illustrates the default (non-zero) splitting behavior.
splitOnNumerics: (integer, default 1) If 0, don't split words on transitions from alpha to numeric:"FemBot3000" -> "Fem", "Bot3000"
catenateWords: (integer, default 0) If non-zero, maximal runs of word parts will be joined: "hot-spot-sensor's" -> "hotspotsensor"
catenateNumbers: (integer, default 0) If non-zero, maximal runs of number parts will be joined: 1947-32" -> "194732"
catenateAll: (0/1, default 0) If non-zero, runs of word and number parts will be joined: "Zap-Master-9000" -> "ZapMaster9000"
preserveOriginal: (integer, default 0) If non-zero, the original token is
preserved: "Zap-Master-9000" -> "Zap-Master-9000", "Zap", "Master", "9000"
protected: (optional) The pathname of a file that contains a list of protected words that should be passed through without splitting.
stemEnglishPossessive: (integer, default 1) If 1, strips the possessive "'s" from each subword.