我正在尝试优化配置单元查询。我已经将我的基表分区并存储为ORC文件,如下所示。
create table if not exists processed (
plc string,
direction string,
table int,
speed float,
time string
) PARTITIONED BY (time_id bigint) STORED AS ORC;
我在上面的表格中触发了以下查询(包含500.000条记录)。我得到的最终结果存储为json
。整个交易大约需要35秒。有没有办法可以减少这段时间。或者可能是,有人可能会建议我使用不同的框架而不是Hive。这是查询:
String finalQuery = "select plc,direction,AVG(speed) as speed ,COUNT(plc) as count,time_id from processed WHERE plc IN "
+ " "
+ "("
+ plcCSV
+ ")"
+ " " + " " + "AND" + " " + "time_id =" + " " + time_id + " "
+ "group by plc,direction,time_id";
答案 0 :(得分:0)
首先在plc列上创建索引,然后尝试。