如何从嵌套的json文件创建Athena表?这是我的示例json文件。我只需要选择的关键值对,例如Roofcondition和garagestalls。
{ "reportId":"7bc7fa76-bf53-4c21-85d6-118f6a8f4244",
"reportOrderedTS":"1529996028730",
"createdTS":"1530304910154",
"report":"{'summaryElements': [{'value': 'GOOD', 'key': 'roofCondition'},
{'value': '98', 'key': 'storiesConfidence'}{'value': '0', 'key':
'garageStalls'}], 'elements': [{'source': 'xyz', 'imageId': '0xxx_png',
'modelVersion': '1.21.0', 'key': 'pool'}, {'source': 'xyz', 'imageId': '0111_png', 'value': 'GOOD', 'modelVersion': '1.36.0', 'key': 'roofCondition','confidence': '49'}], }", "status":"Success", "reportReceivedTS":"1529996033830" }
答案 0 :(得分:4)
首先,您发送了错误版本的JSON文档,正确的版本应如下所示:
{"reportId":"7bc7fa76-bf53-4c21-85d6-118f6a8f4244", "reportOrderedTS":"1529996028730", "createdTS":"1530304910154", "report":{"summaryElements": [{"value": "GOOD", "key": "roofCondition"},{"value": "98", "key": "storiesConfidence"},{"value": "0", "key": "garageStalls"}], "elements": [{"source": "xyz", "imageId": "0xxx_png", "modelVersion": "1.21.0", "key": "pool"},{"source": "xyz", "imageId": "0111_png", "value": "GOOD", "modelVersion": "1.36.0", "key": "roofCondition", "confidence": "49"}] }, "status":"Success", "reportReceivedTS":"1529996033830" }
是的,您可以使用嵌套的json查询Athena上的表。您可以通过创建下表来实现此目的:
CREATE EXTERNAL TABLE example(
`reportId` string,
`reportOrderedTS` bigint,
`createdTS` bigint,
`report` struct<
`summaryElements`: array<struct<`value`:string, `key`: string>>,
`elements`: array<struct<`source`: string, `imageId`:string, `modelVersion`:string, `key`:string, `value`:string, `confidence`:int>>>,
`status` string,
`reportReceivedTS` bigint
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://example'
这是示例查询:
select reportid,reportorderedts,createdts,
summaryelements.value, summaryelements.key, elements.source, elements.key
from example, UNNEST(report.summaryelements) t(summaryelements), UNNEST(report.elements) t(elements)
有用的链接:
https://docs.aws.amazon.com/athena/latest/ug/flattening-arrays.html
https://docs.aws.amazon.com/athena/latest/ug/rows-and-structs.html
答案 1 :(得分:0)
所以这似乎也可行(不是有效的json)!
表的所有原始数据都是json文件中的一行。
行尾没有空格(逗号在表原始行之间只是新行)。
// in your task
@Override
protected Task<Image> createTask() {
return new Task<Image>() {
@Override
protected Image call() {
repaintImage();
blackLinesCount++;
return copyImage(image);
}
};
}