仅当搜索字词大写时,Solr才会获得结果

时间:2017-02-02 09:34:16

标签: solr token

我在Solr搜索中遇到以下问题。

当我搜索以"oe", "ae" or "ue" (Which is in German the same like ö, ä and ü) and the search term is not capitalized开头的单词时,Solr返回0结果。

但是,当我从Solr中搜索相同的单词with capitalized first char of the word, I get results

当我在调试模式下执行搜索时,我看到the non-capitalized search term is always converted from eg. "ue" -> "u"

"response": {
    "numFound": 0,
    "start": 0,
    "docs": []
  },
  "debug": {
    "rawquerystring": "uetze",
    "querystring": "uetze",
    "parsedquery": "(+DisjunctionMaxQuery((content:utze | title:utze | keywords:utze | description:utze^2.0 | browserTitle:utze^3.0)))/no_coord",
    "parsedquery_toString": "+(content:utze | title:utze | keywords:utze | description:utze^2.0 | browserTitle:utze^3.0),
    "explain": {},
    "QParser": "ExtendedDismaxQParser",
    "altquerystring": null,

使用索引后续过滤器时:

<fieldType name="text"        class="solr.TextField"      sortMissingLast="true"  positionIncrementGap="100">
    <analyzer>
        <tokenizer            class="solr.WhitespaceTokenizerFactory" />
        <filter               class="solr.StopFilterFactory" ignoreCase="true" words="stopwords-de.txt" /> <!-- DE -->
        <filter               class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" splitOnCaseChange="0" splitOnNumerics="0" catenateWords="1" catenateNumbers="0" catenateAll="1" stemEnglishPossessive="1" preserveOriginal="1" />
        <filter               class="solr.GermanNormalizationFilterFactory" /> <!-- DE -->
        <filter               class="solr.ASCIIFoldingFilterFactory" /> <!--  DE -->
        <filter               class="solr.LowerCaseFilterFactory" />
    <filter               class="solr.WordDelimiterFilterFactory" catenateAll="1" preserveOriginal="1" />
    </analyzer>

</fieldType>

有谁知道如何避免这种转换? 任何帮助表示赞赏!

1 个答案:

答案 0 :(得分:0)

是的,根据您的配置,它是预期的行为,因为'ß' is replaced by 'ss' 'ä', 'ö', 'ü' are replaced by 'a', 'o', 'u', respectively. 'ae' and 'oe' are replaced by 'a', and 'o', respectively. 'ue' is replaced by 'u', when not following a vowel or q. 执行以下操作:

ue

因此,您可以将其删除,并且u不会被替换为<filter class="solr.LowerCaseFilterFactory" />。另一种可能有用的方法(我不确定,我完全了解您的用例)是将GermanNormalizationFilterFactory置于change

之上