I'm trying to perform internal transformations of fields definded into my solr schema.
I've these 2 fields into my schema.xml :
<field name="source_file" type="string" indexed="true" stored="true" docValues="true"/>
<copyField source="source_file_extraction" dest="text"/> :
The field source_file contains the basename of my indexed docs (example : 1234_helloworld.pdf). I'd like use a regex to extract some data from this basename (example : extract all digits (\d*) => 1234)} and save this extraction into the field source_file_extraction.
For that, I've seen that it could be possible to use regex transformers. I configure the file solr-data-config.xml as :
<dataConfig>
<document>
<entity name="source_file_extraction" transformer="RegexTransformer" query="select coll from source_file_extraction">
<field column="coll" regex=".*?-(\d\d)-.*" sourceColName="source_file"/>
</entity>
</document>
</dataConfig>
And I add a requestHandler into the file solrconfig.xml :
<requestHandler name="/dataimport" class="solr.DataImportHandler">
<lst name="defaults">
<str name="config">solr-data-config.xml</str>
</lst>
</requestHandler>
But it not works.
How to make a simple transformation by regex of a field defined in the schema to another field of the same schema?
Thanks by advance for your help.
答案 0 :(得分:1)
使用solr.PatternReplaceFilterFactory
过滤器工厂进行字段&#34; source_file_extraction&#34;
为字段source_file_extraction
<field name="source_file_extraction" type="NameExtractor" indexed="true" stored="true"/>
<fieldType name="NameExtractor" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="([^0-9])" replacement="" replace="all"/>
</analyzer>
</fieldType>
将source_file中的复制字段添加到source_file_extraction
<copyField source="source_file" dest="source_file_extraction"/>
当令牌被复制到字段source_file_extraction
时,它使用过滤器并仅保留该值中的数字字符并存储。
它不会修改source_file
字段值。
不要忘记在架构修改后重新启动solr。
希望这有帮助, 维诺德