URLdatasource的dataimporthandler中的SOLR子文档定义

时间:2017-12-14 06:54:49

标签: indexing solr nested parent-child dataimporthandler

SOLR dataimporthandlers为JDBC数据源提供父属和子属性。如何为URLdatasource添加父子关系。 我的样本数据集:

<name>ABC</name>
<createdAt>1512016450886</createdAt>
<createdBy>XYZ</createdBy>
<attributes>
    <attribute>
        <name>access</name>
        <value>public</value>
    </attribute>
    <attribute>
        <name>owner</name>
        <value>ABC</value>
    </attribute>
    <attribute>
        <name>url</name>
        <value>planning</value>
    </attribute>
</attributes>

并且需要索引数据输出:

{
  "name": "ABC",
  "createdAt": "1512016450886",
  "createdBy": "XYZ",
  "Attributes": [
    {
      "name": "access",
      "value": "public"
    },
    {
      "name": "owner",
      "value": "ABC"
    },
    {
      "name": "url",
      "value": "planning"
    }
  ]
}

示例数据配置:

<dataConfig>
  <dataSource type="URLDataSource"/>
  <document>
    <entity name="sample"
            url="http://host:port/api/sample_api.xml"
            processor="XPathEntityProcessor" 
            forEach="/hash/name">
        <field column="id" name="id" xpath="/hash/name"/> 
        <field column="createdBy" name="createdBy" xpath="/hash/createdBy"/>
        <field column="createdAt" name="createdAt" xpath="/hash/createdAt"/>
        <field column="attributes" name="attributes" xpath="/hash/attributes"/>
        <field column="attributes.name" name="attributes.name" xpath="/hash/attributes/attribute/name"/>
        <field column="attributes.value" name="attributes.value" xpath="/hash/attributes/attribute/value"/>
   </entity>
  </document>
</dataConfig> 

回应是:

{“name”:“ABC”,      “createdAt”:“1512016450886”,      “createdBy”:“XYZ”,     “attributes.name”:['access','owner','url'],      “attributes.value”:['public','ABC','planning']}

我尝试了这个新的data-config.xml:

<dataConfig>
    <script>
    <![CDATA[ id = 1; 
    function f1(row) { row.put('attr.attrId', (id ++).toFixed()); return row; } ]]>
    </script>
      <dataSource type="URLDataSource"/>
      <document>
        <entity name="entity"
                url="http://abc:9090/api/sample_api.xml"
                processor="XPathEntityProcessor" 
                forEach="/hash/entity/entity">
        <field column="id" name="id" xpath="/hash/entity/entity/name"/> 
            <field column="createdBy" name="createdBy" xpath="/hash/entity/entity/createdBy"/>
               <entity name="attributes"
                url="http://abc:9090/api/sample_api.xml"
                child="true" 
                processor="XPathEntityProcessor" 
                forEach="/hash/entity/entity/xyz/xyz" transformer="script:f1">
            <field  column="attr.attrId" name="attr.attrId"/>   
            <field  column="attr.attrName" name="attr.attrName" xpath="/hash/entity/entity/xyz/xyz/name"/>
           <field  column="attr.attrValue" name="attr.attrValue" xpath="/hash/entity/entity/xyz/xyz/value"/>
            </entity>
           </entity>
      </document>
    </dataConfig> 

但是我在solr.log中得到以下错误

[   x:xml_data] o.a.s.h.d.SolrWriter Error creating document : SolrInputDocument(fields: [createdBy=XYZ, id=ABC, _version_=1587094252791267328, _root_=ABC], children: [SolrInputDocument(fields: [attr.attrName=access, attr.attrId=1, attr.attrValue=public, _root_=ABC, _version_=1587094252791267328]), SolrInputDocument(fields: [attr.attrName=access12, attr.attrId=2, attr.attrValue=public12, _root_=ABC, _version_=1587094252791267328])])
org.apache.solr.common.SolrException: [doc=null] missing required field: id
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:265)
    at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:107)
    at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:212)
    at org.apache.solr.update.AddUpdateCommand$1.next(AddUpdateCommand.java:185)
    at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:259)
    at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:433)
    at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1384)
    at org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:920)
    at org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:913)
    at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:302)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:239)
    at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:194)
    at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)
    at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:979)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1192)
    at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:748)
    at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
    at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:80)
    at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:254)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:526)
    at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:414)
    at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:329)
    at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:232)
    at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:415)
    at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:474)
    at org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:457)
    at java.lang.Thread.run(Thread.java:748)

1 个答案:

答案 0 :(得分:0)

实际上,这很容易做到。支持child="true"

这是更新的data-config.xml的样子:

<dataConfig>
<dataSource type="URLDataSource" encoding="utf-8" />
<document>
  <entity name="entity"
          url="path:to:xml"
          processor="XPathEntityProcessor"
          forEach="/hash/entity">
    <field column="id" name="id" xpath="/hash/entity/name" />
    <field column="createdBy" name="createdBy" xpath="/hash/entity/createdBy" />
    <field column="createdAt" name="createdAt" xpath="/hash/entity/createdAt" />
    <entity name="attributes"
            url="path:to:xml"
            child="true" processor="XPathEntityProcessor" forEach="/hash/entity/attributes/attribute">
      <field column="name" xpath="/hash/entity/attributes/attribute/name" />
      <field column="value" xpath="/hash/entity/attributes/attribute/value" />
    </entity>
  </entity>
</document>
</dataConfig>

与您的相比做了什么:如果您想创建子文档,则需要使用child="true"创建嵌套实体。您还需要指定数据路径和相同的处理器。另外,一些xpath不正确。

Api XML应该正确格式化(以前,你没有1个根标签,而是其中几个):

<hash>
  <entity>
    <name>ABC</name>
    <createdAt>1512016450886</createdAt>
    <createdBy>XYZ</createdBy>
    <attributes>
      <attribute>
        <name>access</name>
        <value>public</value>
      </attribute>
      <attribute>
        <name>owner</name>
        <value>ABC</value>
      </attribute>
      <attribute>
        <name>url</name>
        <value>planning</value>
      </attribute>
    </attributes>
  </entity>
</hash>