SQL / HIVE:如何计算购买天数

时间:2016-01-24 14:12:18

标签: sql hadoop hive

SQL / Hive:我希望计算访问者购买的天数。这是我的数据的样子

date    visitor orders
1-Jan   A   0  
1-Jan   B   0  
4-Jan   B   1  
5-Jan   A   0  
12-Jan  A   1

这是我期待的结果:

Days to purchase    count of visitors
0   0
1   0 
2   0
3   1
4   0
5   0
.   .
.   .
.   .
11  1

任何帮助?

1 个答案:

答案 0 :(得分:1)

如果我理解正确的话: 你需要做的是找到每个访客+订单组合的最小日期

select visitor,orders,min(date) as min.date from table group by visitor,orders

这应该是这样的:

visitor orders min.date
  A         0  1-Jan 
  B         0  1-Jan
  B         1  4-Jan
  A         1  12-Jan

这个表(让我们称之为tbl)可以自己加入来提供

select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
from (select visitor,min.date as first.visit from tbl where orders=0) A 
inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
on A.visitor=B.visitor

现在,使用外部查询包装此查询以计算具有相同日期的访问者:

 select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from (select visitor,min.date as first.visit from tbl where orders=0) A 
    inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase
希望我能正确理解你。我不确定这是否是正确的解决方案,但你并没有给我很多开始:)

完整的解决方案可能是:

 select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from 
(select visitor,min.date as first.visit from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=0) A 
    inner join 
(select visitor,min.date as purchase.date from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase