Apache Phoenix查询性能提升

时间:2015-12-18 08:02:31

标签: hbase phoenix

我们正在尝试使用Apache Phoenix驱动程序针对 ~11.5 M 记录的数据集提高HBase设置的读取性能。

  

HBase 0.98
  Apache Phoenix驱动程序4.3.1
  Squirrel Client 3.2

该表由 21列组成,下面是DDL语句:

create table *table_name* (PKEY BIGINT not null primary key,DATE_KEY BIGINT,TIMEOFDAY_KEY BIGINT,GMT_TZ_PKEY BIGINT,FACT_DATE TIMESTAMP,PAGE_KEY BIGINT,FFER_KEY BIGINT,OFFER_TYPE_KEY BIGINT,SESSION_KEY BIGINT,CUSTOMER_KEY BIGINT,VISITS_CNTR BIGINT,ELIGIBLE_CNTR smallint,  PRESENTED_CNTR smallint,ACCEPTED smallint,  ACCEPTED_CLICK smallint,FIRST_RESPONSE_CNTR smallint,REJECTED_CNTR smallint,IS_FIXED smallint,  IGNORED_CNTR smallint,ENGAGED_CNTR smallint,CONVERTED_CNTR smallint)

我们在表格上执行了salting( salt_buckets = 3 )并在所有列上创建了一个二级索引(不可变索引)。

我们正在执行以下查询,并在Squirrel客户端中提及相应的时间:

Select count(*) from *table_name* :   
Query Time (A) = 0.031 s  
Transport time (B) = 2.631 s  
Total Execution Time (A+B)  = 2.661 s  

执行计划:

PLAN  
CLIENT 6-CHUNK PARALLEL 6-WAY 
FULL SCAN OVER OFR_FCT_IDX_SALTED  
SERVER FILTER BY FIRST KEY ONLY  
SERVER AGGREGATE INTO SINGLE ROW  
CLIENT 100 ROW LIMIT

select MAX(session_key) from *table_name* group by TIMEOFDAY_KEY having count(SESSION_KEY) > 100 order by TIMEOFDAY_KEY : Rows returned 431   
Query Time (A) = 0.04 s  
Transport time (B) = 11.894 s  
Total Execution Time (A+B)  = 11.934 s 

执行计划:

PLAN  
CLIENT 6-CHUNK PARALLEL 6-WAY FULL SCAN OVER OFR_FCT_IDX_SALTED  
SERVER FILTER BY FIRST KEY ONLY  SERVER AGGREGATE INTO DISTINCT ROWS BY ["TIMEOFDAY_KEY"]  
CLIENT MERGE SORT  
CLIENT FILTER BY COUNT(TO_BIGINT("SESSION_KEY")) > 100  
CLIENT SORTED BY ["TIMEOFDAY_KEY"]

正如您所看到的,查询时间很长,但传输时间(读取/输出时间)似乎非常高。

我的问题如下:

  1. 这些结果是否与我们期望的结果一致 提到的数据集?考虑到最新的性能测试结果: 最新的性能测试
  2. 我们能以某种方式改善运输时间的性能(阅读 时间)进一步?

1 个答案:

答案 0 :(得分:0)

升级到至少Apache Phoenix 4.7,显然,他们开始使用protobuf作为默认的序列化方法(通过JSON)。