嗨,我正在使用apache phoenix通过hbase查询sql。
表架构
CREATE TABLE TABLE_1 (
SF_ID VARCHAR NOT NULL,
ENTITY_ID VARCHAR NOT NULL,
PRODUCT_SKU VARCHAR,
CITY_NAME VARCHAR,
SCREEN_NAME VARCHAR,
PRODUCT_LIST_VIEWS BIGINT,
PRODUCT_LIST_CLICKS BIGINT,
PRODUCT_LIST_CTR FLOAT,
TIMESTAMP BIGINT NOT NULL,
POS INTEGER NOT NULL,
CONSTRAINT pk PRIMARY KEY (SF_ID, ENTITY_ID, TIMESTAMP, POS));
我已创建二级索引,如下所示:-
CREATE INDEX GA_2 ON TABLE_1 (ENTITY_ID) INCLUDE (PRODUCT_LIST_VIEWS, PRODUCT_LIST_CLICKS, PRODUCT_LIST_CTR);
但是,在50万行上运行时,以下查询大约需要1.5s到2s。
select ENTITY_ID as "entityId", sum(PRODUCT_LIST_VIEWS) as "productViewSum", sum(PRODUCT_LIST_CLICKS) as "productClickSum", sum(PRODUCT_LIST_CTR) as "productCTRSum" from "TABLE_1" group by ENTITY_ID;
说明计划如下:-
CLIENT 1-CHUNK 0 ROWS 0 BYTES PARALLEL 1-WAY FULL SCAN OVER GA_2
SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"]
有什么方法可以改善查询的响应时间?
/ ******************************************** / < / p>
更新:- 按照解释计划,我创建了一个带有12个桶的盐渍表。
现在说明计划如下:-
+-------------------------------------------------------------------+----------+
| PLAN | EST_BYTE |
+-------------------------------------------------------------------+----------+
| CLIENT 12-CHUNK PARALLEL 12-WAY FULL SCAN OVER GA_3 | null |
| SERVER AGGREGATE INTO ORDERED DISTINCT ROWS BY ["ENTITY_ID"] | null |
| CLIENT MERGE SORT | null |
+-------------------------------------------------------------------+----------+
但是响应时间仍然相同。
再观察一件事:-
如果我在查询中不使用sum,响应速度将非常快。
例如
select ENTITY_ID, SUM(PRODUCT_LIST_VIEWS) from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;
此查询耗时631毫秒
但是
select ENTITY_ID from GA_TABLE_2 where SF_ID = '1' group by ENTITY_ID;
这只花了30毫秒。