将分析应用于copyfield

时间:2013-07-19 14:35:53

标签: solr solr4

我希望复制一个字段,并在副本中应用一个额外的分析器。虽然我知道如何制作副本(<copyField source="source" dest="dest")但我真正想做的是在副本上运行不同的分析器(ASCIIFoldingFilterFactory)。

如何更改copyField的类型以便我可以运行该附加分析器? 我是否需要更改类型,或者我可以运行其他分析仪吗?

我相信我可以通过使用与复制字段名称匹配的动态字段来解决这个问题,然后改变那种方式,但不会创建我的数据的额外副本吗?

1 个答案:

答案 0 :(得分:4)

您只需要定义一个新的fieldType并将copyField声明为该类型。

例如,下面text_syn是一种应用一组分析器的类型,text_stop_syn_stem是另一种类型,它有更多的分析器(用于停止词删除和词干):

<types>
    ...
    <fieldType name="text_syn" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.LowerCaseFilterFactory"/>        
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>        
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
        </analyzer>
    </fieldType>

    <fieldType name="text_stop_syn_stem" class="solr.TextField" positionIncrementGap="100">
        <analyzer type="index">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="0" catenateAll="1" splitOnCaseChange="1"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
            <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
            <filter class="solr.PorterStemFilterFactory"/>
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.WhitespaceTokenizerFactory"/>
            <filter class="solr.WordDelimiterFilterFactory" preserveOriginal="1" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
            <filter class="solr.PorterStemFilterFactory"/>        
            <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            <filter class="solr.ASCIIFoldingFilterFactory"/>
        </analyzer>
    </fieldType>
    ...
</types>

在我们得到的字段下:

<field name="name_syn" type="text_syn" indexed="true" stored="true" />
<field name="name_stop_syn_stem" type="text_stop_syn_stem" indexed="true" stored="false" />

,copyField就像:

<copyField source="name_syn" dest="name_stop_syn_stem" />