SQL,Impala:为什么我不能对一个查询进行两次计数

时间:2017-02-27 23:58:32

标签: sql count impala

我尝试在查询中为不同的列执行两项计数:

select  count(distinct color) as cid,
  count(distinct entity) as eid from my_table 

上述查询无法处理以下错误:

SQLException: [Simba][ImpalaJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: 
all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT color); deviating function: count(DISTINCT entity)
), Query: select  count(distinct color) as cid,
  count(distinct entity) as eid from my_table

但是,如果我只做一次计数,那么查询就可以了。这是为什么?我可以在一个查询中做两个计数吗? 谢谢!

3 个答案:

答案 0 :(得分:3)

Impala当前不支持同一查询中的多个计数不同表达式,请参阅IMPALA-110。这是一项要求的功能,但实际上很难实现,因此还没有添加。

目前,如果您不需要精确的准确度,则可以通过指定NDV(column)来估算列的不同值;查询可以包含NDV(column)的多个实例。要让Impala自动将COUNT(DISTINCT)表达式重写为NDV(),请启用APPX_COUNT_DISTINCT查询选项(请参阅documentation)。

答案 1 :(得分:0)

我不是100%肯定这会在Impala中运行,但你可以使用窗口函数和条件聚合来count(distinct)。所以,这个查询:

select count(distinct color) as cid,
       count(distinct entity) as eid
from my_table ;

相当于:

select sum(case when seqnum_color = 1 then 1 else 0 end) as cid,
       sum(case when seqnum_entity = 1 then 1 else 0 end) as eid
from (select t.*, 
             row_number() over (partition by color order by color) as seqnum_color,
             row_number() over (partition by entity order by entity) as seqnum_entity
      from my_table t
     ) t;

答案 2 :(得分:0)

对此的更新-Impala 3.1(于2018年11月发布)在新的查询块中增加了对多个不同聚合函数的支持。