无法获得以停用词开头的Solr查询的结果

时间:2015-02-14 13:41:19

标签: json search solr lucene

我正在尝试配置允许以停用词开头的查询的平台。 我有以下文件:

        {
          "responseHeader":{
            "status":0,
            "QTime":1,
            "params":{
              "indent":"true",
              "q":"*:*",
              "wt":"json"}},
          "response":{"numFound":1,"start":0,"docs":[
              {
                "weight_metric":0.3,
                "maximumPowerDraw":9,
                "beamAngle":50,
                "name_de":"German",
                "type":["product"],
                "id":"5dac69a9-7d54-43f9-b815-0a54e519a1f0",
                "name":"Aloa something"
                }]
          }}

使用名为name的字段,一个用于英语(默认),另一个名为name_de用于德语。 但我无法理解为什么要进行此查询http://localhost:8080/solr-webapp/collection1/select?q=name_de:German%20welcher&wt=json&indent=true我能够在结果中看到该文档 如果我在开始时用停用词(welcher)执行此其他查询,则无法获得任何结果http://localhost:8080/solr-webapp/collection1/select?q=name_de:welcher%20German%20welcher&wt=json&indent=true

虽然我希望在两种情况下都能获得与第一次查询相同的结果。
但是,对于默认语言,它可以正常工作。
在这里,我复制了schema.xml的一些片段

    <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <!-- in this example, we will only use synonyms at query time
                 <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <!-- Case insensitive stop word removal.
                   add enablePositionIncrements=true in both the index and query
          analyzers to leave a 'gap' for more accurate phrase queries.
        -->
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
                enablePositionIncrements="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
                     <filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"
                enablePositionIncrements="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
                     <filter class="solr.EnglishMinimalStemFilterFactory"/>
-->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

    <!-- German -->
    <fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball"
                enablePositionIncrements="true"/>
        <filter class="solr.GermanNormalizationFilterFactory"/>
        <!--<filter class="solr.GermanMinimalStemFilterFactory"/>-->

      </analyzer>
    </fieldType>

...

    <field name="name" type="text_en" stored="true" indexed="true"/>
    <field name="name_de" type="text_de" stored="true" indexed="true"/>
...

    <copyField source="*_de" dest="text_de"/>
    <copyField source="name" dest="text"/>

...

<field name="text" type="text_general" stored="false" indexed="true" multiValued="true" termVectors="true"/>
<field name="text_de" type="text_de" stored="false" indexed="true" multiValued="true" termVectors="true"/>

有人知道如何解决这种不受欢迎的行为吗? (对于字段名称,请改为使用结果http://localhost:8080/solr-webapp/collection1/select?q=name:the%20Aloa&wt=json&indent=true

的预期行为

1 个答案:

答案 0 :(得分:4)

问题在于您的查询语法。请参阅Lucene query syntax documentation中的此示例。您的查询是:

name_de:welcher German welcher

将仅在name_de中搜索第一个查询字词。其余的将在默认字段(名称)中搜索。你的查询是有效的:

name_de:welcher name:German name:welcher

相反,请尝试:

name_de:(welcher German welcher)