使用Solr 3.6.1,阿拉伯语单词不存储&索引

时间:2014-10-17 02:11:40

标签: php solr arabic

嗨,大家好,

 I was searching an answer to insert the arabic letters using apach solr 3.6.1, used the following,

 In Schema, 

            <fieldType name="text_ar" class="solr.TextField" positionIncrementGap="100">
            <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <!--  for any non-arabic  -->
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StopFilterFactory" ignoreCase="true"   
             words="lang/stopwords_ar.txt" enablePositionIncrements="true"/>
            <!--  normalizes ﻯ to ﻱ, etc  -->
            <filter class="solr.ArabicNormalizationFilterFactory"/>
            <filter class="solr.ArabicStemFilterFactory"/>
            </analyzer>
            </fieldType> 

 Output Response:

            <response>
            <lst name="responseHeader">
            <int name="status">0</int>
            <int name="QTime">8</int>
            <lst name="params">
            <str name="indent">on</str>
            <str name="start">0</str>
            <str name="q">*:*</str>
            <str name="version">2.2</str>
            <str name="rows">10</str>
            </lst>
            </lst>
            <result name="response" numFound="1" start="0">
            <doc>
            <str name="company_name">?????</str>
            <str name="id">1</str>
            <arr name="search_supplier_keyword">
            <str>?????</str>
            </arr>
            <str name="supplier_name">?????</str>
            </doc>
            </result>
            </response>

我无法存储阿拉伯语单词,而是将问号显示为(?????)。这里有什么我真的想念的吗?在那儿 ?请在这里帮助我和可能的解决方案。

谢谢,  ABS

1 个答案:

答案 0 :(得分:0)

在solr 3.3中,我们使用以下架构表示阿拉伯语单词及其工作效果。 同样对你也有帮助。

架构:

<fieldType name="text_arabic" class="solr.TextField" positionIncrementGap="100">
 <analyzer type="index">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ArabicNormalizationFilterFactory"/>
  <filter class="solr.ArabicStemFilterFactory"/>
 </analyzer>
 <analyzer type="query">
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.ArabicNormalizationFilterFactory"/>
  <filter class="solr.ArabicStemFilterFactory"/>
  <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/>
 </analyzer>
</fieldType>

输出:

 <arr name="Education_arabic">
  <str>مهندس مدنى خبرة لا تقل عن 4سنوات</str>