我有以下JSON(粗略),我想分别从header
和defects
字段中提取信息:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890",
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
我尝试使用file.header.timeStamp
等访问各个元素但返回null
。我尝试使用flatten(file)
,但这给了我
无法将org.apache.drill.exec.vector.complex.MapVector转换为org.apache.drill.exec.vector.complex.RepeatedValueVector
我已经查看了kvgen()
,但是不知道这对我的情况如何。我试过了kvgen(file.header)
,但这让我
无论如何,这是我的预期。kvgen函数仅支持简单地图作为输入
有谁知道我如何获得header
和defects
,因此我可以处理其中包含的信息。理想情况下,我只是从header
中选择信息,因为它不包含任何数组或地图,因此我可以按原样获取单个记录。对于defects
,我只需使用FLATTEN(defectParts)
来获取有缺陷部分的表格。
任何帮助都将不胜感激。
答案 0 :(得分:6)
您使用的是什么版本的Drill?我尝试在最新的master(1.7.0-SNAPHOT)上查询以下文件:
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
{
"file": {
"header": {
"timeStamp": "2016-03-14T00:20:15.005+04:00",
"serialNo": "3456",
"sensorId": "1234567890"
},
"defects": [
{
"info": {
"systemId": "DEFCHK123",
"numDefects": "3",
"defectParts": [
"003", "006", "008"
]
}
}
]
}
}
以下查询工作正常: 1。
select t.file.header.serialno as serialno from `parts.json` t;
+-----------+
| serialno |
+-----------+
| 3456 |
| 3456 |
+-----------+
2 rows selected (0.098 seconds)
2
select flatten(t.file.defects) defects from `parts.json` t;
+---------------------------------------------------------------------------------------+
| defects |
+---------------------------------------------------------------------------------------+
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
| {"info":{"systemId":"DEFCHK123","numDefects":"3","defectParts":["003","006","008"]}} |
+---------------------------------------------------------------------------------------+
3
select q.h.serialno as serialno, q.d.info.defectParts as defectParts from (select t.file.header h, flatten(t.file.defects) d from `parts.json` t) q;
+-----------+----------------------+
| serialno | defectParts |
+-----------+----------------------+
| 3456 | ["003","006","008"] |
| 3456 | ["003","006","008"] |
+-----------+----------------------+
2 rows selected (0.126 seconds)
PS:这应该是一个评论,但我还没有足够的代表!
答案 1 :(得分:0)
我没有使用Apache Drill的经验,但查看了手册。这不是你想要的吗?