Question

我正在尝试优化配置单元查询。我已经将我的基表分区并存储为ORC文件，如下所示。

create table if not exists processed (
    plc string,
    direction string,
    table int,
    speed float,
    time string
) PARTITIONED BY (time_id bigint) STORED AS ORC;

我在上面的表格中触发了以下查询（包含500.000条记录）。我得到的最终结果存储为json。整个交易大约需要35秒。有没有办法可以减少这段时间。或者可能是，有人可能会建议我使用不同的框架而不是Hive。这是查询：

String finalQuery = "select plc,direction,AVG(speed) as speed ,COUNT(plc) as count,time_id from processed WHERE plc IN "
                + " "
                + "("
                + plcCSV
                + ")"
                + " " + " " + "AND" + " " + "time_id =" + " " + time_id + " " 
                + "group by plc,direction,time_id";

Answer 1

首先在plc列上创建索引，然后尝试。

优化配置单元查询

1 个答案: