如何索引Solr-5.2.1中的大内容?

时间:2015-11-04 15:52:32

标签: ruby-on-rails solr sunspot sunspot-rails sunspot-solr

我们的内容超过32KB,我们无法索引内容

请参阅以下日志

Rails日志:

RSolr::Error::Http: RSolr::Error::Http - 400 Bad Request Error: {'responseHeader'=>{'status'=>400,'QTime'=>13},'error'=>{'msg'=>'Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.','code'=>400}}

Solr Log:

INFO  - 2015-11-04 15:00:30.772; [   collection] org.apache.solr.update.processor.LogUpdateProcessor; [collection] webapp=/solr path=/update params={wt=ruby} {} 0 27

ERROR - 2015-11-04 15:00:30.779; [   collection] org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Exception writing document id Article 872cc4f7-8731-4049-b889-85a040edb543 to the index; possible analysis error.

。 。

 Caused by: java.lang.IllegalArgumentException: Document contains at least one immense term in field="content_textv" (whose UTF8 encoding is longer than the max length 32766), all of which were skipped.  Please correct the analyzer to not produce such terms.  The prefix of the first immense term is: '[60, 112, 62, 83, 109, 97, 108, 108, 32, 97, 110, 100, 32, 77, 101, 100, 105, 117, 109, 32, 83, 99, 97, 108, 101, 32, 69, 110, 116, 101]...'

内容字段类型:

 <field name="content_textv" type="strings"/>

...

 <fieldType name="strings" class="solr.StrField" multiValued="true" sortMissingLast="true"/>

如何索引大内容?

1 个答案:

答案 0 :(得分:1)

而不是solr.StrField使用solr.TextField。 创建一个新的fieldType,如 -

<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="false">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <!-- in this example, we will only use synonyms at query time
    <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
    -->
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
  </analyzer>
</fieldType>

比您可以将该字段类型用作 -

<field name="content_textv" type="text_general" indexed="true" stored="false" multiValued="true"/>