Solr麻烦搜索包含点的竞争

时间:2016-01-04 14:45:57

标签: solr lucene

嗨[堆栈溢出的所有聪明人]

我正在开发一个项目,其中我有一些内容用点[。]分隔, 所以我正在尝试搜索像[word.tk]这样的内容,它用于内容中的标记系统。 这些标签存储在单独的动态字段中

{
    "ts_ticker_market": "word.tk",
    "ts_market":        "tk",
    "ts_ticker":        "word",
    "ds_date":          "2007-07-30T21:00:00Z",
    "ts_ar_search":     "en word.tk tk words qnb clarifies:",
    "ts_search":        "en word.tk tk words qnb clarifies: ...",
    "content":          "en word.tk tk words qnb clarifies: ...",
},

搜索查询可能看起来像$ term是word.tk:

{!boost b=recip(ms(NOW,ds_date),3.16e-11,1,0.5)}
    (
        ts_ticker_market:($term)^200
        OR ts_market:($term)^175
        OR ts_ticker_market:($term)^125
        OR ts_ticker:($term)^50
        OR ts_ticker:($term*)^25
        OR ts_title:($term)^4
        OR ts_body:($term)^2
        OR ts_search:($term)^1
    )

我得到了

analyzer returned too many terms for multiTerm term: word.tk" 

我希望有人可以帮我解决这个问题吗?

编辑: 根据要求提供更多信息: schema.xml分析器

<analyzer type="index">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <!-- Case insensitive stop word removal.
      add enablePositionIncrements=true in both the index and query
      analyzers to leave a 'gap' for more accurate phrase queries.
    -->
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="1"
            catenateNumbers="1"
            catenateAll="0"
            splitOnCaseChange="0"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true"
            />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="0"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>
  <analyzer type="multiterm">
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory"
            ignoreCase="true"
            words="stopwords.txt"
            enablePositionIncrements="true" />
    <filter class="solr.WordDelimiterFilterFactory"
            protected="protwords.txt"
            generateWordParts="1"
            generateNumberParts="1"
            catenateWords="0"
            catenateNumbers="0"
            catenateAll="0"
            splitOnCaseChange="1"
            preserveOriginal="1"/>
    <filter class="solr.LengthFilterFactory" min="2" max="100" />
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.SnowballPorterFilterFactory" language="English" protected="protwords.txt"/>
    <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
  </analyzer>

希望这有帮助。

1 个答案:

答案 0 :(得分:0)

为了跟进,我无法对点.分离的东西进行搜索。最后我最终将其分解为多个字段,并为Solr中的每个字段添加不同的助推器。

我的搜索结果是这样的。笨重但它完成了工作。

ts_search:( $term )^0.01
 OR ts_ticker:( $term )^0.3
 OR ts_market:( $term )^0.2
 OR ts_ticker_market:( $term )^0.4
 OR ts_ticker:( $term*)^0.09
 OR ts_market:( $term*)^0.04
 OR ts_ticker_market:( $term*)^0.16
 OR ts_company_name_ar:( $term )^0.01
 OR ts_company_name_en:( $term )^0.01

幸运的是,文本是分裂的,每个部分都很重要。问题是Solr无法搜索,因为它返回了许多结果,所有结果都是相同的得分,0或者一些非常高的值。

这样一切都会得到更分开的分数。