使用dataimport将数据从CSV,XML或Json文件导入到solr

时间:2019-01-15 10:42:51

标签: json solr import-from-csv

我正在尝试从CSV,XML和Json等文件中将数据导入solr内核,我对solr还是陌生的,因此这对某些人来说可能很简单,但对我来说,我尝试了许多在线建议,但未获得期望的结果。

所以我有一个json文件,并且通过将以下requestHandler添加到solrconfig.xml中来启用了数据导入:

<requestHandler  name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
        <str name="config">solr-data-config.xml</str>
    </lst>
</requestHandler>

在solr-data-config.xml中:

<dataConfig>
<dataSource name="dfs" type="FileDataSource"/>
<document>
    <entity name="sourcefile" processor="FileListEntityProcessor" fileName=".*" rootEntity="false" baseDir="${solr.install.dir}/example/exampledocs">
        <entity name="entryline" processor="LineEntityProcessor" url="${sourcefile.fileAbsolutePath}" rootEntity="true" dataSource="fds" separator=","/>
    </entity>
</document>

更新的版本:

    <dataConfig>
     <script><![CDATA[
    function CategoryPieces(row) {
      var pieces = row.get('manu_id_s').split('/');
      var arr = new Array();
      for (var i=0; i < pieces.length; i++) {
        row.put('manu_id_s' + i, pieces[i].trim());
        arr[i] = pieces[i].trim();
      }
      row.put('manu_id_s', (pieces.length - 1).toFixed());
      row.put('manu_id_s', arr.join('/'));
      row.put('manu_id_s', arr.join('/'));
      return row;
  }
     ]]></script>
  <dataSource type="FileDataSource" />
  <document>
    <entity
      name="document"
      processor="FileListEntityProcessor"
      baseDir="${solr.install.dir}/example/exampledocs/khaled"
      fileName=".*.xml$"
      recursive="false"
      rootEntity="false"
      dataSource="null">
      <entity
      name="test"
        processor="XPathEntityProcessor"
        transformer="script:CategoryPieces"
        url="${document.fileAbsolutePath}"
        useSolrAddSchema="true"
        stream="true">
      </entity>
    </entity>
  </document>
</dataConfig>

当我使用dataimport时,我在solr的用户界面的路径中添加了一个json,csv和xml文件,例如请求:0,提取:26,已跳过:0,已处理:0且日志中没有任何内容,有人可以建议如何将文件中的数据添加到solr吗?

这是我在xml文件中的示例数据:

<add>
<doc>
  <field name="id">USD</field>
  <field name="name">One Dollar</field>
  <field name="manu">Bank of America</field>
  <field name="manu_id_s">boa</field>
  <field name="cat">currency</field>
  <field name="features">Coins and notes</field>
  <field name="price_c">1,USD</field>
  <field name="inStock">true</field>
</doc>

<doc>
  <field name="id">EUR</field>
  <field name="name">One Euro</field>
  <field name="manu">European Union</field>
  <field name="manu_id_s">eu</field>
  <field name="cat">currency</field>
  <field name="features">Coins and notes</field>
  <field name="price_c">1,EUR</field>
  <field name="inStock">true</field>
</doc>

<doc>
  <field name="id">GBP</field>
  <field name="name">One British Pound</field>
  <field name="manu">U.K.</field>
  <field name="manu_id_s">uk</field>
  <field name="cat">currency</field>
  <field name="features">Coins and notes</field>
  <field name="price_c">1,GBP</field>
  <field name="inStock">true</field>
</doc>

<doc>
  <field name="id">NOK</field>
  <field name="name">One Krone</field>
  <field name="manu">Bank of Norway</field>
  <field name="manu_id_s">nor</field>
  <field name="cat">currency</field>
  <field name="features">Coins and notes</field>
  <field name="price_c">1,NOK</field>
  <field name="inStock">true</field>
</doc>

</add>

现在查询返回的数据如下:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "_":"1547556805546"}},
  "response":{"numFound":4,"start":0,"docs":[
      {
        "manu_id_s":"boa",
        "id":"USD",
        "_version_":1622731088078045184},
      {
        "manu_id_s":"eu",
        "id":"EUR",
        "_version_":1622731088081190912},
      {
        "manu_id_s":"uk",
        "id":"GBP",
        "_version_":1622731088081190913},
      {
        "manu_id_s":"nor",
        "id":"NOK",
        "_version_":1622731088082239488}]
  }}

0 个答案:

没有答案