在索引Solr数据时无法修剪尾随空格?

时间:2019-04-22 07:46:42

标签: solr solrcloud

我有一个带有3个zookeeper和2个solr实例的solr云设置。我正在尝试通过dih将xml文件(嵌套文档)中的数据索引到solr中,并尝试删除尾随空格,以便在搜索后不显示空格。

文件样本:

<doc>
   <sku>...</sku>
   <data>
     <date>..</date>
     <store>..</store>
    <econn>..</econn>
   </data>
</doc>
...
...
</product>

i have not shared the DIH , as it is working fine.

i have tried both links :- 

https://stackoverflow.com/questions/24570545/is-it-possible-to-get-solrs-dataimporthadler-to-ignore-fields-with-empty-string

https://fossies.org/linux/solr/solr/example/example-DIH/solr/atom/conf/solrconfig.xml

actual file :-
<doc>
   <sku>abc </sku>
   <data>
      <date>2019-19-08</date>
      <store>somestore </store>
     <econn>false </econn>
   </data>
</doc>

expected output after indexing:- 
<doc>
   <sku>abc</sku>
   <data>
     <date>2019-19-08</date>
     <store>somestore</store>
     <econn>false</econn>
   </data>
</doc>

both parent and child trailing spaces should be trimmed or either of those ,which depends on context.

1 个答案:

答案 0 :(得分:0)

最适合我的解决方案是在data-config.xml文件中应用regexTransformer。

<entity name="foo" transformer="RegexTransformer" 
<field column="new_field" xpath="path/to/field/in/xml" regex="(\s|\t)" replaceWith="" />
...
...
...
...
</entity>

有时候答案很简单!!!!!!!