我正在尝试执行此查询以获取一些数据。在s3:// my_datalake / my_table / year = 2018 / month = 9 / day = 7 /中的s3上我拥有的文件大小为1.1 TB,我有10014 snappy.parquet对象。
SELECT array_join(array_agg(distinct endpoint),',') as endpoints_all, count(endpoint) as count_endpoints
FROM my_datalake.my_table
WHERE year=2018 and month=09 and day=07
and ts between timestamp '2018-09-07 00:00:00' and timestamp '2018-09-07 23:59:59'
and status = '2'
GROUP BY domain, size, device_id, ip
但是我遇到了这个错误:
以此比例系数查询耗尽的资源
(Run time: 6 minutes 41 seconds, Data scanned: 153.87GB)
我有YEAR,MONTH,DAY和HOUR分区。我该如何查询?我可以使用Amazon Athena还是需要使用其他工具?
我的表的架构是:
CREATE EXTERNAL TABLE `ssp_request_prueba`(
`version` string,
`adunit` string,
`adunit_original` string,
`brand` string,
`country` string,
`device_connection_type` string,
`device_density` string,
`device_height` string,
`device_id` string,
`device_type` string,
`device_width` string,
`domain` string,
`endpoint` string,
`endpoint_version` string,
`external_dfp_id` string,
`id_req` string,
`ip` string,
`lang` string,
`lat` string,
`lon` string,
`model` string,
`ncc` string,
`noc` string,
`non` string,
`os` string,
`osv` string,
`scc` string,
`sim_operator_code` string,
`size` string,
`soc` string,
`son` string,
`source` string,
`ts` timestamp,
`user_agent` string,
`status` string,
`delivery_network` string,
`delivery_time` string,
`delivery_status` string,
`delivery_network_key` string,
`delivery_price` string,
`device_id_original` string,
`tracking_limited` string,
`from_cache` string,
`request_price` string)
PARTITIONED BY (
`year` int,
`month` int,
`day` int,
`hour` int)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
's3://my_datalake/my_table'
TBLPROPERTIES (
'has_encrypted_data'='false',
'transient_lastDdlTime'='1538747353')
答案 0 :(得分:1)
该问题可能与array_join和array_agg函数有关。我想在这种情况下,超出了Athena服务中节点的内存限制。结合这些功能,雅典娜可能无法管理如此大量的数据。