我根据corporatezen.com/2013/11/updating-solr-engine-coldfusion使用CF10,它应该使用Solr 3.4。我向<charFilter class="solr.HTMLStripCharFilterFactory"/>
添加了<fieldType name="text">
,但搜索结果中的摘要字段仍包含HTML。知道为什么吗?
<field name="summary" type="text" indexed="false" stored="true" required="false" />
http://localhost:8985/solr/test/admin/schema.jsp显示:
字段:摘要字段类型:TEXT
属性:标记化,存储
架构:标记化,存储
位置增量差距:100
Index Analyzer:org.apache.solr.analysis.TokenizerChain DETAILS
字符过滤器:
org.apache.solr.analysis.HTMLStripCharFilterFactory args:{luceneMatchVersion:LUCENE_24} Tokenizer类: org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:
org.apache.solr.analysis.StopFilterFactory args:{words:stopwords.txt ignoreCase:true enablePositionIncrements:true luceneMatchVersion: LUCENE_24} org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:1 generateNumberParts:1 catenateWords:1 luceneMatchVersion:LUCENE_24 generateWordParts:1 catenateAll:0 catenateNumbers:1} org.apache.solr.analysis.LowerCaseFilterFactory args:{luceneMatchVersion:LUCENE_24} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt luceneMatchVersion:LUCENE_24} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{luceneMatchVersion:LUCENE_24}查询分析器: org.apache.solr.analysis.TokenizerChain DETAILS
字符过滤器:
org.apache.solr.analysis.HTMLStripCharFilterFactory args:{luceneMatchVersion:LUCENE_24} Tokenizer类: org.apache.solr.analysis.WhitespaceTokenizerFactory
过滤器:
org.apache.solr.analysis.SynonymFilterFactory args:{synonyms: synonyms.txt expand:true ignoreCase:true luceneMatchVersion: LUCENE_24} org.apache.solr.analysis.StopFilterFactory args:{words: stopwords.txt ignoreCase:true luceneMatchVersion:LUCENE_24} org.apache.solr.analysis.WordDelimiterFilterFactory args:{splitOnCaseChange:1 generateNumberParts:1 catenateWords:0 luceneMatchVersion:LUCENE_24 generateWordParts:1 catenateAll:0 catenateNumbers:0} org.apache.solr.analysis.LowerCaseFilterFactory args:{luceneMatchVersion:LUCENE_24} org.apache.solr.analysis.EnglishPorterFilterFactory args:{protected: protwords.txt luceneMatchVersion:LUCENE_24} org.apache.solr.analysis.RemoveDuplicatesTokenFilterFactory args:{luceneMatchVersion:LUCENE_24}
答案 0 :(得分:3)
您需要区分存储和索引。您添加到字段中的过滤器将更改存储在Solr索引中的标记,以进行搜索,但不会更改存储的显示值。
Solr保留两个版本的字段*。一个是存储的。这是文本的原始部分,在您的情况下包含 HTML。另一个是索引版本。在那里,已经应用了文本分析的全部魔力。
然后,当您执行搜索时,索引用于查找已创建匹配项的文档。显示结果时,会显示存储的版本。
*当然只有在您开启stored="true"
和indexed="true"
的时候。