如何在Hive SQL中使用Coalesce GROUPBY?

时间:2018-01-18 12:42:23

标签: sql hive

我按小时汇总Twitter数据,并希望得到每小时百分比的推文,我可以运行:

SELECT CAST(substr(created_at, 12, 2)  AS INT) AS hr, 
COUNT(substr(created_at, 12, 2)) AS cnt , 
ROUND(100 * (count(substr(created_at, 12, 2)) / tot.total),2) AS 
cntpercent

FROM tweets0, (select count(*) AS total from tweets0 WHERE 
racist = true) tot WHERE racist = true

GROUP BY substr(created_at, 12, 2), tot.total

给出:

hr  cnt cntpercent
0   153 3.79
1   144 3.56 ...

然后我意识到我不允许使用utc_offset,所以我的SQL变得可怕:

SELECT CAST(coalesce(pmod(CAST(substr(created_at, 12, 2)  AS INT) 
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2)  AS INT)) AS INT)
, COUNT(CAST(coalesce(pmod(CAST(substr(created_at, 12, 2)  AS INT) 
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2)  AS INT)) AS INT)) 
AS cnt , 
ROUND(100 * ( CAST(coalesce(pmod(CAST(substr(created_at, 12, 2)  AS INT) 
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2)  AS INT)) AS INT) 
             / tot.total),2) AS cntpercent

FROM tweets0, (select count(*) AS total from tweets0 WHERE racist = true) tot 
WHERE racist = true

GROUP BY CAST(coalesce(pmod(CAST(substr(created_at, 12, 2)  AS INT) 
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2)  AS INT)) AS INT)
, tot.total

问题是这给出了累积百分比:

_c0 cnt cntpercent
0   188 0.0
1   121 0.02
2   131 0.05
3   86  0.07 ...

我期待因为COALESCE在GROUP中。谁能告诉我如何解决这个问题或做其他事情?

编辑:根据jarlh的建议,我现在有一个带有相同问题的派生表版本:

SELECT hr_adj, COUNT(hr_adj) AS cnt, ROUND(100 * hour.hr_adj/ tot.total,2) AS cntpercent

FROM tweets0, (select count(*) AS total from tweets0 WHERE racist = true) tot, 
(SELECT CAST(coalesce(pmod(CAST(substr(created_at, 12, 2)  AS INT) 
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2)  AS INT)) AS INT) 
AS hr_adj
 from tweets0 WHERE racist = true) hour WHERE racist = true

GROUP BY hour.hr_adj, tot.total

0 个答案:

没有答案