我无法使用 Solr的DataImportHandler 来索引 Solr的添加架构中的XML文件,如果我通过HTTP发送它,我可以将其编入索引更新请求。
问题在于文档中的嵌套实体。我无法正确索引它们。
以下是要编入索引的XML文件的示例:
<add commitWithin="5000">
<doc>
<field name="id">1</field>
<field name="type">Document</field>
<doc>
<field name="id">1_1</field>
<field name="nested_status">Nested</field>
</doc>
<field name="isParent">true</field>
</doc>
</add>
我的 data-config.xml :
<dataConfig>
<dataSource name="Test_XML"
type="FileDataSource"
encoding="ISO_8859_1"/>
<document>
<entity name="doc"
processor="XPathEntityProcessor"
stream="true"
useSolrAddSchema="true"
url="LOCATION\useSolrAddSchema_test.xml">
<entity name="nested_doc"
processor="XPathEntityProcessor"
stream="true"
useSolrAddSchema="true"
child="true"
url=LOCATION\useSolrAddSchema_test.xml">
</entity>
</entity>
</document>
</dataConfig>
调试响应是:
{
"responseHeader": {
"status": 0,
"QTime": 162
},
"initArgs": [
"defaults",
[
"config",
"data-config.xml"
]
],
"command": "full-import",
"mode": "debug",
"documents": [
{
"isParent": [
"true"
],
"id": [
"1"
],
"type": [
"Document"
],
"_version_": [
1549343328462438400
],
"_root_": [
"1"
]
}
],
"verbose-output": [],
"status": "idle",
"importResponse": "",
"statusMessages": {
"Total Requests made to DataSource": "0",
"Total Rows Fetched": "2",
"Total Documents Processed": "1",
"Total Documents Skipped": "0",
"Full Dump Started": "2016-10-27 11:48:59",
"": "Indexing completed. Added/Updated: 1 documents. Deleted 0 documents.",
"Committed": "2016-10-27 11:48:59",
"Time taken": "0:0:0.149"
}
}
所以它忽略了嵌套文档,当我查询获取所有索引文档时,我得到了外部文档的两个副本:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"*:*",
"indent":"on",
"wt":"json",
"_":"1477568715268"}},
"response":{"numFound":2,"start":0,"docs":[
{
"id":"1",
"type":"Document"},
{
"id":"1",
"type":"Document",
"_version_":1549343328462438400}]
}}
我查看了this question,其中接受的答案说不可能有嵌套实体,但是因为Solr 5.1应该可以使用child='True'
属性。
我目前正在使用Solr版本6.2.1,但更喜欢与旧版本兼容的解决方案。