我正在尝试配置允许以停用词开头的查询的平台。 我有以下文件:
{
"responseHeader":{
"status":0,
"QTime":1,
"params":{
"indent":"true",
"q":"*:*",
"wt":"json"}},
"response":{"numFound":1,"start":0,"docs":[
{
"weight_metric":0.3,
"maximumPowerDraw":9,
"beamAngle":50,
"name_de":"German",
"type":["product"],
"id":"5dac69a9-7d54-43f9-b815-0a54e519a1f0",
"name":"Aloa something"
}]
}}
使用名为name的字段,一个用于英语(默认),另一个名为name_de用于德语。
但我无法理解为什么要进行此查询http://localhost:8080/solr-webapp/collection1/select?q=name_de:German%20welcher&wt=json&indent=true我能够在结果中看到该文档
如果我在开始时用停用词(welcher)执行此其他查询,则无法获得任何结果http://localhost:8080/solr-webapp/collection1/select?q=name_de:welcher%20German%20welcher&wt=json&indent=true
虽然我希望在两种情况下都能获得与第一次查询相同的结果。
但是,对于默认语言,它可以正常工作。
在这里,我复制了schema.xml的一些片段
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<!-- in this example, we will only use synonyms at query time
<filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
-->
<!-- Case insensitive stop word removal.
add enablePositionIncrements=true in both the index and query
analyzers to leave a 'gap' for more accurate phrase queries.
-->
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
<filter class="solr.EnglishMinimalStemFilterFactory"/>
-->
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
enablePositionIncrements="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
<filter class="solr.EnglishMinimalStemFilterFactory"/>
-->
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
<!-- German -->
<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"
enablePositionIncrements="true"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<!--<filter class="solr.GermanMinimalStemFilterFactory"/>-->
</analyzer>
</fieldType>
...
<field name="name" type="text_en" stored="true" indexed="true"/>
<field name="name_de" type="text_de" stored="true" indexed="true"/>
...
<copyField source="*_de" dest="text_de"/>
<copyField source="name" dest="text"/>
...
<field name="text" type="text_general" stored="false" indexed="true" multiValued="true" termVectors="true"/>
<field name="text_de" type="text_de" stored="false" indexed="true" multiValued="true" termVectors="true"/>
有人知道如何解决这种不受欢迎的行为吗? (对于字段名称,请改为使用结果http://localhost:8080/solr-webapp/collection1/select?q=name:the%20Aloa&wt=json&indent=true)
的预期行为答案 0 :(得分:4)
问题在于您的查询语法。请参阅Lucene query syntax documentation中的此示例。您的查询是:
name_de:welcher German welcher
将仅在name_de中搜索第一个查询字词。其余的将在默认字段(名称)中搜索。你的查询是有效的:
name_de:welcher name:German name:welcher
相反,请尝试:
name_de:(welcher German welcher)