要求是在索引自身时不要删除HTML标签,因为我以后需要带有HTML标签的内容才能以适当的样式显示文档。我只想要由SOLR 5.2.1高亮模块生成的摘录文本返回不含HTML标记的代码段文本。请建议是否可以执行此操作以及如何操作?
mannaged-schema.xml的PFB相关部分:
<fieldType name="text_general" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<processor class="solr.HTMLStripFieldUpdateProcessorFactory">
<str name="typeClass">solr.TextField</str>
</processor>
</analyzer>
</fieldType>
solrconfig.xml的PFB相关部分:
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor class="solr.HTMLStripFieldUpdateProcessorFactory">
<str name="typeClass">solr.TextField</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>