我按小时汇总Twitter数据,并希望得到每小时百分比的推文,我可以运行:
SELECT CAST(substr(created_at, 12, 2) AS INT) AS hr,
COUNT(substr(created_at, 12, 2)) AS cnt ,
ROUND(100 * (count(substr(created_at, 12, 2)) / tot.total),2) AS
cntpercent
FROM tweets0, (select count(*) AS total from tweets0 WHERE
racist = true) tot WHERE racist = true
GROUP BY substr(created_at, 12, 2), tot.total
给出:
hr cnt cntpercent
0 153 3.79
1 144 3.56 ...
然后我意识到我不允许使用utc_offset,所以我的SQL变得可怕:
SELECT CAST(coalesce(pmod(CAST(substr(created_at, 12, 2) AS INT)
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2) AS INT)) AS INT)
, COUNT(CAST(coalesce(pmod(CAST(substr(created_at, 12, 2) AS INT)
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2) AS INT)) AS INT))
AS cnt ,
ROUND(100 * ( CAST(coalesce(pmod(CAST(substr(created_at, 12, 2) AS INT)
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2) AS INT)) AS INT)
/ tot.total),2) AS cntpercent
FROM tweets0, (select count(*) AS total from tweets0 WHERE racist = true) tot
WHERE racist = true
GROUP BY CAST(coalesce(pmod(CAST(substr(created_at, 12, 2) AS INT)
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2) AS INT)) AS INT)
, tot.total
问题是这给出了累积百分比:
_c0 cnt cntpercent
0 188 0.0
1 121 0.02
2 131 0.05
3 86 0.07 ...
我期待因为COALESCE在GROUP中。谁能告诉我如何解决这个问题或做其他事情?
编辑:根据jarlh的建议,我现在有一个带有相同问题的派生表版本:
SELECT hr_adj, COUNT(hr_adj) AS cnt, ROUND(100 * hour.hr_adj/ tot.total,2) AS cntpercent
FROM tweets0, (select count(*) AS total from tweets0 WHERE racist = true) tot,
(SELECT CAST(coalesce(pmod(CAST(substr(created_at, 12, 2) AS INT)
+ (user.utc_offset/3600),24), CAST(substr(created_at, 12, 2) AS INT)) AS INT)
AS hr_adj
from tweets0 WHERE racist = true) hour WHERE racist = true
GROUP BY hour.hr_adj, tot.total