我有一个像
这样的结构的表id,href(string),active_users(integer),measured(datetime)
此表包含给定时间内给定站点的活动用户数量的度量值(每小时,即有历史数据,并且您可以有多个具有相同href的行,但它将具有测量值)。
我希望在给定的时间间隔内获得每个href的最大活跃用户数以及达到此峰值的时间(即,在这个月,请告诉我每个href何时拥有它的峰值活跃用户和时)。
我最初编写的查询如下所示:
SELECT href,
max(active_users),
measured AS peak_date
FROM users_peak
WHERE measured >= DATE('2016-09-07')
AND measured <= DATE('2016-10-07')
GROUP BY href
问题是,在结果中,日期不是达到此峰值的日期。如果我进行手动检查(SELECT * FROM users_peak WHERE href='one_of_the_hrefs_from_the_result' AND measured >= DATE('2016-09-07') AND measured <= DATE('2016-10-07')
并按active_users值对其进行排序),则与峰值对应的记录中的日期会有所不同。当我将查询修改为如下所示:
SELECT od.href,
od.active_users,
oddate.measured AS measured
FROM users_peak AS oddate
JOIN
(SELECT href,
max(active_users) AS active_users
FROM users_peak
WHERE measured >= DATE('2016-09-07')
AND measured <= DATE('2016-10-07')
GROUP BY href) AS od ON od.href = oddate.href
WHERE oddate.active_users = od.active_users
AND measured >= DATE('2016-09-07')
AND measured <= DATE('2016-10-07')
GROUP BY od.href
返回正确的结果。为什么初始查询返回非对应日期?
答案 0 :(得分:0)
你真正想要的是为每个具有峰值的HREF获取一行
这可以通过保留您的查询来完成,但添加一个INNER JOIN
子句,它只返回具有最大值的行(因此elimintate / filter out不代表最大值的行)
SELECT href,
active_users,
measured AS peak_date
FROM users_peak a
INNER JOIN (
SELECT href, MAX(active_users) max
FROM users_peak
GROUP BY href
) b ON a.href = b.href AND a.max = b.max
WHERE
measured >= DATE('2016-09-07')
AND measured <= DATE('2016-10-07')