我正在使用Apache Phoenix运行一些查询,但与我期望的相比,它们的性能看起来很糟糕。例如,考虑如下表:
CREATE TABLE MY_SHORT_TABLE (
MPK BIGINT not null,
... 38 other columns ...
CONSTRAINT pk PRIMARY KEY (MPK, 4 other columns))
SALT_BUCKETS = 4;
其中有460000行, 像这样的查询:
select sum(MST.VALUES),
MST.III, MST.BBB, MST.DDD, MST.FFF,
MST.AAA, MST.CCC, MST.EEE, MST.HHH
from
MY_SHORT_TABLE MST
group by
MST.AAA, MST.BBB, MST.CCC, MST.DDD,
MST.EEE, MST.FFF, MST.HHH, MST.III
需要9到11秒才能完成。 在具有类似结构但具有近3 400 000行的表中,完成查询需要45秒。
我在这个群集中有5个主机(1个主服务器和4个RegionServer + PhoenixQS),具有6个vCPU和32GB RAM。
我在这个例子中使用的配置是:
HBase RegionServer Maximum Memory=8192(8GB)
HBase Master Maximum Memory=8192(8GB)
Number of Handlers per RegionServer=30
Memstore Flush Size=128MB
Maximum Record Size=1MB
Maximum Region File Size=10GB
% of RegionServer Allocated to Read Buffers=40%
% of RegionServer Allocated to Write Buffers=40%
HBase RPC Timeout=6min
Zookeeper Session Timeout=6min
Phoenix Query Timeout=6min
Number of Fetched Rows when Scanning from Disk=1000
dfs.client.read.shortcircuit=true
dfs.client.read.shortcircuit.buffer.size=131072
phoenix.coprocessor.maxServerCacheTimeToLiveMs=30000
我正在使用HDP 2.4.0,所以Phoenix 4.4。
示例查询说明如下:
+------------------------------------------+
| PLAN |
+------------------------------------------+
| CLIENT 8-CHUNK PARALLEL 8-WAY FULL SCAN OVER MY_SHORT_TABLE |
| SERVER AGGREGATE INTO DISTINCT ROWS BY [AAA, BBB, CCC, DDD, EEE, FFF, HHH |
| CLIENT MERGE SORT |
+------------------------------------------+
另外,我创建了一个索引:
CREATE INDEX i1DENORM2T1 ON MY_SHORT_TABLE (HHH)
INCLUDE ( AAA, BBB, CCC, DDD, EEE, FFF, HHH, VALUES ) ;
此索引将查询执行计划更改为:
+------------------------------------------+
| PLAN |
+------------------------------------------+
| CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER I1DENORM2T1 |
| SERVER AGGREGATE INTO DISTINCT ROWS BY ["AAA", "BBB", "DDD", "EEE", "FFF", "HHH |
| CLIENT MERGE SORT |
+------------------------------------------+
然而,表现与预期不符(大约3-4秒)。
上述配置有什么问题或者为了获得更好的性能我应该改变什么?
提前致谢。