我试图通过计算分区中的行来计算我的数据的一些抽查,计算我每天看到的“使用量”,并计算我每天看到的值的数量。
我之前能够获得以下版本的以下查询,但我必须在没有意识到的情况下更改某些内容:
src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;
我不清楚我做了什么格式错误导致Hive无法识别这个。我也觉得有可能有更好的方法来计算这三个项目,即使它们被分组在不同的列上。
答案 0 :(得分:0)
select pe.val as dt
,count(case when pe.pos = 0 then 1 end) as src_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
then 1
end
) as timestamp_row_count
,count
(
case
when pe.pos = 1
and pe.val between date '2016-01-01' and date '2017-07-01'
and filter_column in ('filterA', 'filterB')
then 1
end
) as user_count
from database.table t
lateral view posexplode (array(partition_date_column,timestamp_pst)) pe
where partition_date_column > date '2016-01-01'
group by pe.val
答案 1 :(得分:0)
我明白了。我错过了" WITH"在允许多个select语句的代码的开头。
With src as
(
select partition_date_column, count(*) as src_row_count
from database.table
where partition_date_column > '2016-01-01'
group by partition_date_column
)
,
pst as
(
select timestamp_pst as datevalue, count(*) as timestamp_row_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
group by timestamp_pst
),
users as
(
select timestamp_pst as user_datevalue, count(*) as user_count
from database.table
where partition_date_column > '2016-01-01'
and timestamp_pst between '2016-01-01' and '2017-07-01'
and filter_column in ('filterA', 'filterB')
group by timestamp_pst
)
select datevalue as dayval, src_row_count, timestamp_row_count, user_count
from pst
left join src
on datevalue = partition_date_column
left join users
on datevalue = user_datevalue
order by dayval;