我尝试在查询中为不同的列执行两项计数:
select count(distinct color) as cid,
count(distinct entity) as eid from my_table
上述查询无法处理以下错误:
SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException:
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color); deviating function: count(DISTINCT entity)
), Query: select count(distinct color) as cid,
count(distinct entity) as eid from my_table
但是,如果我只做一次计数,那么查询就可以了。这是为什么?我可以在一个查询中做两个计数吗? 谢谢!
答案 0 :(得分:3)
Impala当前不支持同一查询中的多个计数不同表达式,请参阅IMPALA-110。这是一项要求的功能,但实际上很难实现,因此还没有添加。
目前,如果您不需要精确的准确度,则可以通过指定NDV(column)
来估算列的不同值;查询可以包含NDV(column)
的多个实例。要让Impala自动将COUNT(DISTINCT)
表达式重写为NDV()
,请启用APPX_COUNT_DISTINCT
查询选项(请参阅documentation)。
答案 1 :(得分:0)
我不是100%肯定这会在Impala中运行,但你可以使用窗口函数和条件聚合来count(distinct)
。所以,这个查询:
select count(distinct color) as cid,
count(distinct entity) as eid
from my_table ;
相当于:
select sum(case when seqnum_color = 1 then 1 else 0 end) as cid,
sum(case when seqnum_entity = 1 then 1 else 0 end) as eid
from (select t.*,
row_number() over (partition by color order by color) as seqnum_color,
row_number() over (partition by entity order by entity) as seqnum_entity
from my_table t
) t;
答案 2 :(得分:0)
对此的更新-Impala 3.1(于2018年11月发布)在新的查询块中增加了对多个不同聚合函数的支持。