Solr搜索德语单词

时间:2014-08-20 14:11:15

标签: solr4

我的德语单词有问题。 Solr(版本4.0.0)tokenzie将Kälte改为两个错误的令牌。也许我对德语文本字段的定义错误。

字段的定义如下。

<fieldType name="text_de" class="solr.TextField" positionIncrementGap="100">

<analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2"/>
  </analyzer>

  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_de.txt" format="snowball" />
    <filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.GermanNormalizationFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" language="German2"/>

  </analyzer>

调试查询:

<str name="parsedquery">text_de:kã text_de:lte</str><str name="parsedquery_toString">text_de:kã text_de:lte</str>

1 个答案:

答案 0 :(得分:1)

如果您正在运行Tomcat作为应用程序容器,则可以尝试在AJP / 1.3 Connector上修改server.xml文件并添加URIEncoding =“UTF8”。我找到了Solution