Apache Drill中无嵌套的嵌套JSON结构

时间:2016-03-14 08:45:51

标签: json apache-drill

我有以下JSON(粗略),我想分别从headerdefects字段中提取信息:

{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890",
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}

我尝试使用file.header.timeStamp等访问各个元素但返回null。我尝试使用flatten(file),但这给了我

  

无法将org.apache.drill.exec.vector.complex.MapVector转换为org.apache.drill.exec.vector.complex.RepeatedValueVector

我已经查看了kvgen(),但是不知道这对我的情况如何。我试过了kvgen(file.header),但这让我

  

kvgen函数仅支持简单地图作为输入

无论如何,这是我的预期。

有谁知道我如何获得headerdefects,因此我可以处理其中包含的信息。理想情况下,我只是从header中选择信息,因为它不包含任何数组或地图,因此我可以按原样获取单个记录。对于defects,我只需使用FLATTEN(defectParts)来获取有缺陷部分的表格。

任何帮助都将不胜感激。

2 个答案:

答案 0 :(得分:6)

您使用的是什么版本的Drill?我尝试在最新的master(1.7.0-SNAPHOT)上查询以下文件:

{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890"
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}
{
  "file": {
    "header": {
      "timeStamp": "2016-03-14T00:20:15.005+04:00",
      "serialNo": "3456",
      "sensorId": "1234567890"
    },
    "defects": [
      {
        "info": {
          "systemId": "DEFCHK123",
          "numDefects": "3",
          "defectParts": [
            "003", "006", "008"
          ]
        }
      }
    ]
  }
}

以下查询工作正常: 1。

select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno  |
+-----------+
| 3456      |
| 3456      |
+-----------+
2 rows selected (0.098 seconds)

2

select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
|                                        defects                                        |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}}  |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}}  |
+---------------------------------------------------------------------------------------+

3

select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno  |     defectParts      |
+-----------+----------------------+
| 3456      | ["003","006","008"]  |
| 3456      | ["003","006","008"]  |
+-----------+----------------------+
2 rows selected (0.126 seconds)

PS:这应该是一个评论,但我还没有足够的代表!

答案 1 :(得分:0)