我有一个问题,我真的不知道该怎么做...
这很简单,我在SORL中创建了2个索引:
“Scholastic Reader,Level 2>” “Scholastic Reader,Level 3>”
(符号>转到字符串的末尾)
搜索1:当我通过“Scholastic Reader,Level”搜索时,服务返回两个索引,这很好。
XML响应:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">type:masterseries AND title:("Scholastic Reader, Level")</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="2" start="0">
<doc>
<str name="id">118</str>
<arr name="title">
<str>Scholastic Reader, Level 2 ></str>
</arr>
<str name="type">masterseries</str>
<str name="uuid">3bf5b10c-a286-4ad0-9c63-bb402f57a7ed</str>
</doc>
<doc>
<str name="id">118</str>
<arr name="title">
<str>Scholastic Reader, Level 3 ></str>
</arr>
<str name="type">masterseries</str>
<str name="uuid">cdb19c28-0988-4375-acf0-916bc6664055</str>
</doc>
</result>
</response>
搜索2:通过“Scholastic Reader,Level 3”搜索,它将返回“Scholastic Reader,Level 3&gt;” GREAT!
查询字符串:类型:masterseries AND title :(“Scholastic Reader,Level 3”) XML响应:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">type:masterseries AND title:("Scholastic Reader, Level 3")</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="id">118</str>
<arr name="title">
<str>Scholastic Reader, Level 3 ></str>
</arr>
<str name="type">masterseries</str>
<str name="uuid">cdb19c28-0988-4375-acf0-916bc6664055</str>
</doc>
</result>
</response>
但这是奇怪的事情
搜索3:通过“Scholastic Reader,Level 2”搜索,甚至是“Scholastic Reader,Level 2&gt;”的确切字符串搜索返回“NOTHING”
查询字符串:类型:masterseries AND title :(“Scholastic Reader,Level 2”) XML响应:
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">2</int>
<lst name="params">
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">type:masterseries AND title:("Scholastic Reader, Level 2")</str>
<str name="version">2.2</str>
<str name="rows">10</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
即使我使用1,4,5,6这样的数字创建索引也可以,但是级别为“2”的字符串不起作用。
感谢您的帮助。
更新:
在schema.xml文件中添加一些配置:
<fieldType name="text_en" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="lang/stopwords_en.txt"
enablePositionIncrements="false" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.EnglishPossessiveFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt" />
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory" />
<tokenizer class="solr.StandardTokenizerFactory" />
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="lang/stopwords_en.txt"
enablePositionIncrements="false" />
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.ISOLatin1AccentFilterFactory" />
<filter class="solr.EnglishPossessiveFilterFactory" />
<filter class="solr.KeywordMarkerFilterFactory"
protected="protwords.txt" />
<filter class="solr.PorterStemFilterFactory" />
</analyzer>
</fieldType>
答案 0 :(得分:2)
我敢打赌你的问题在于:
<filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true" />
看看“synonyms.txt”,我猜你会发现一个用“too”替换“2”的条目(如果它是“to”则会被StopFilter删除并且你想要从来没有注意到差异)。自expand=true
起,这将导致查询如下:
"Scholastic Reader Level 2 too"
对于一组不带引号的TermQuery
s,这是合适的,但不适用于PhraseQuery
。要解决此问题,您可以将SynonymFilter合并到"index"
分析器
我可以看到的另一种可能性是,在ISOLatin1AccentFilterFactory
和StopFilter
之后LowerCaseFilter
发生了奇怪的事情,因为应用过滤器的顺序可能导致不同的输出,但是我非常怀疑这是问题。