运行多个计数并加入结果

时间:2017-06-15 17:29:46

标签: sql hive grouping counting

我试图通过计算分区中的行来计算我的数据的一些抽查,计算我每天看到的“使用量”,并计算我每天看到的值的数量。

我之前能够获得以下版本的以下查询,但我必须在没有意识到的情况下更改某些内容:

src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;

我不清楚我做了什么格式错误导致Hive无法识别这个。我也觉得有可能有​​更好的方法来计算这三个项目,即使它们被分组在不同的列上。

2 个答案:

答案 0 :(得分:0)

select      pe.val  as dt

           ,count(case when pe.pos = 0 then 1 end)  as src_row_count

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                    then    1 
                end
            ) as    timestamp_row_count 

           ,count
            (
                case  
                    when    pe.pos = 1 
                        and pe.val between date '2016-01-01' and date '2017-07-01' 
                        and filter_column in ('filterA', 'filterB')
                    then    1 
                end
            ) as    user_count

from        database.table  t
            lateral view posexplode (array(partition_date_column,timestamp_pst)) pe

where       partition_date_column > date '2016-01-01' 

group by    pe.val

答案 1 :(得分:0)

我明白了。我错过了" WITH"在允许多个select语句的代码的开头。

With src as
(
   select partition_date_column, count(*) as src_row_count
   from database.table
   where partition_date_column > '2016-01-01' 
   group by partition_date_column
)

,
pst as
(
  select timestamp_pst as datevalue, count(*) as timestamp_row_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  group by timestamp_pst
),

users as
(
  select timestamp_pst as user_datevalue, count(*) as user_count
  from database.table
  where partition_date_column > '2016-01-01'
  and timestamp_pst between '2016-01-01' and '2017-07-01'
  and filter_column in ('filterA', 'filterB')
  group by timestamp_pst
)

select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;