使用useSolrAddSchema和嵌套实体

时间:2016-10-27 11:52:34

标签: solr dataimporthandler

我无法使用 Solr的DataImportHandler 来索引 Solr的添加架构中的XML文件,如果我通过HTTP发送它,我可以将其编入索引更新请求。

问题在于文档中的嵌套实体。我无法正确索引它们。

以下是要编入索引的XML文件的示例:

<add commitWithin="5000">
  <doc>
    <field name="id">1</field>
    <field name="type">Document</field>
    <doc>
      <field name="id">1_1</field>
      <field name="nested_status">Nested</field>
    </doc>
    <field name="isParent">true</field>
  </doc>
</add>

我的 data-config.xml

<dataConfig>
  <dataSource name="Test_XML"
              type="FileDataSource"
              encoding="ISO_8859_1"/>
  <document>
    <entity name="doc"
                processor="XPathEntityProcessor"
                stream="true"
                useSolrAddSchema="true"
                url="LOCATION\useSolrAddSchema_test.xml">
      <entity name="nested_doc"
              processor="XPathEntityProcessor"
              stream="true"
              useSolrAddSchema="true"
              child="true"
              url=LOCATION\useSolrAddSchema_test.xml">
      </entity>
    </entity>
  </document>
</dataConfig>

调试响应是:

{
  "responseHeader": {
    "status": 0,
    "QTime": 162
  },
  "initArgs": [
    "defaults",
    [
      "config",
      "data-config.xml"
    ]
  ],
  "command": "full-import",
  "mode": "debug",
  "documents": [
    {
      "isParent": [
        "true"
      ],
      "id": [
        "1"
      ],
      "type": [
        "Document"
      ],
      "_version_": [
        1549343328462438400
      ],
      "_root_": [
        "1"
      ]
    }
  ],
  "verbose-output": [],
  "status": "idle",
  "importResponse": "",
  "statusMessages": {
    "Total Requests made to DataSource": "0",
    "Total Rows Fetched": "2",
    "Total Documents Processed": "1",
    "Total Documents Skipped": "0",
    "Full Dump Started": "2016-10-27 11:48:59",
    "": "Indexing completed. Added/Updated: 1 documents. Deleted 0 documents.",
    "Committed": "2016-10-27 11:48:59",
    "Time taken": "0:0:0.149"
  }
}

所以它忽略了嵌套文档,当我查询获取所有索引文档时,我得到了外部文档的两个副本:

{
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"*:*",
      "indent":"on",
      "wt":"json",
      "_":"1477568715268"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"1",
        "type":"Document"},
      {
        "id":"1",
        "type":"Document",
        "_version_":1549343328462438400}]
  }}

我查看了this question,其中接受的答案说不可能有嵌套实体,但是因为Solr 5.1应该可以使用child='True'属性。

我目前正在使用Solr版本6.2.1,但更喜欢与旧版本兼容的解决方案。

0 个答案:

没有答案