我们正在尝试使用Apache Phoenix驱动程序针对 ~11.5 M 记录的数据集提高HBase设置的读取性能。
HBase 0.98
Apache Phoenix驱动程序4.3.1
Squirrel Client 3.2
该表由 21列组成,下面是DDL语句:
create table *table_name* (PKEY BIGINT not null primary key,DATE_KEY BIGINT,TIMEOFDAY_KEY BIGINT,GMT_TZ_PKEY BIGINT,FACT_DATE TIMESTAMP,PAGE_KEY BIGINT,FFER_KEY BIGINT,OFFER_TYPE_KEY BIGINT,SESSION_KEY BIGINT,CUSTOMER_KEY BIGINT,VISITS_CNTR BIGINT,ELIGIBLE_CNTR smallint, PRESENTED_CNTR smallint,ACCEPTED smallint, ACCEPTED_CLICK smallint,FIRST_RESPONSE_CNTR smallint,REJECTED_CNTR smallint,IS_FIXED smallint, IGNORED_CNTR smallint,ENGAGED_CNTR smallint,CONVERTED_CNTR smallint)
我们在表格上执行了salting( salt_buckets = 3 )并在所有列上创建了一个二级索引(不可变索引)。
我们正在执行以下查询,并在Squirrel客户端中提及相应的时间:
Select count(*) from *table_name* :
Query Time (A) = 0.031 s
Transport time (B) = 2.631 s
Total Execution Time (A+B) = 2.661 s
执行计划:
PLAN
CLIENT 6-CHUNK PARALLEL 6-WAY
FULL SCAN OVER OFR_FCT_IDX_SALTED
SERVER FILTER BY FIRST KEY ONLY
SERVER AGGREGATE INTO SINGLE ROW
CLIENT 100 ROW LIMIT
select MAX(session_key) from *table_name* group by TIMEOFDAY_KEY having count(SESSION_KEY) > 100 order by TIMEOFDAY_KEY : Rows returned 431
Query Time (A) = 0.04 s
Transport time (B) = 11.894 s
Total Execution Time (A+B) = 11.934 s
执行计划:
PLAN
CLIENT 6-CHUNK PARALLEL 6-WAY FULL SCAN OVER OFR_FCT_IDX_SALTED
SERVER FILTER BY FIRST KEY ONLY SERVER AGGREGATE INTO DISTINCT ROWS BY ["TIMEOFDAY_KEY"]
CLIENT MERGE SORT
CLIENT FILTER BY COUNT(TO_BIGINT("SESSION_KEY")) > 100
CLIENT SORTED BY ["TIMEOFDAY_KEY"]
正如您所看到的,查询时间很长,但传输时间(读取/输出时间)似乎非常高。
我的问题如下:
答案 0 :(得分:0)
升级到至少Apache Phoenix 4.7,显然,他们开始使用protobuf作为默认的序列化方法(通过JSON)。