Question

您好我有一个CSV，每个字段的标签分开：

id  name    subject description comments
c4e 10181   Hello1  d1  1
741 10181   Hello2  d2  2
b62 10181   Hello3  d3  3
fd4 10181   Hello4  d4  4
2fb 10181   Hello5  d5  5

我想用solr Regextransformer对它进行正则表达式，通过Dataimporthandler（DIH）导入它，但最终正则表达式不起作用：

 <field column="id" sourceColName="rawLine" regex="^(.*)\t(.*)\t(.*)\t(.*)\t"/>
 <field column="name" sourceColName="rawLine" regex="\t(.*)\t(.*)\t(.*)\t(.*)$"/>
 <field column="subject" sourceColName="rawLine" regex="\t(.*)\t(.*)\t(.*)$"/>
 <field column="description" sourceColName="rawLine" regex="\t(.*)\t(.*)$"/>
 <field column="comments" sourceColName="rawLine" regex="\t(.*)$"/>

主题，描述和评论是错误的，他们还需要前面的字段，正则表达式有什么问题？

Answer 1

从你的描述中，我会说这是一个贪婪的问题。是否有必要将最后3行中.*的每次出现替换为.*?

Solr Regex - 解析选项卡分离的CSV

1 个答案: