我有一个带有4个(相关列)的数据集:唯一ID,user_id,time_stamp,event。唯一ID是主键,user_id可以重复,time_stamp(datetime)在事件发生时采取,事件是a)推送通知(推送)或b)用户打开应用程序(打开)。
它看起来像这样:
id user_id time_stamp event count it?
1 1 10 open
2 1 23 push -good
3 1 28 open
4 1 38 push -bad
5 1 65 open
6 1 85 push -good
7 1 89 open
8 1 28 push -bad
9 2 38 push -good
10 2 45 open
11 2 46 open
我想知道我的推送通知是否有用。为此,我需要查看用户是否在推送通知的20分钟内打开应用程序。我认为这是一次成功的推动"而所有其他推动都不会成功。到目前为止,我已经有了内心加入表格的想法,但我遇到了重复行的问题。例如,我们会使用ID 4获得误报,因为只应计算ID 3。
SELECT * FROM
(SELECT * FROM table WHERE row = 'open') a
INNER JOIN (SELECT * FROM table WHERE row = 'push') b
ON a.user_id = b.user_id) WHERE a.time_stamp - b.timestamp < 20;
答案 0 :(得分:1)
因为你有同一个user_id的多条记录,我认为你想要采用最新的'open'time_stamp并将其与每个用户的最新'push'进行比较?
如果是这样,我认为以下内容符合您的要求(需要整理,但应该做到这一点):
SELECT et4.User_id, ts1, et3.User_id, ts2
FROM
(SELECT et1.user_id, max(et1.time_stamp) as ts1 from eventtable as et1
where et1.event = 'push'
group by et1.user_id
) as et4
INNER JOIN
(SELECT et2.user_id, max(et2.time_stamp) as ts2 from eventtable as et2
where event = 'open' group by et2.user_id) as et3
ON et3.user_id = et4.user_id
WHERE ts2 -ts1 < 20
基本上,为每个用户选择最新推送,并将其连接到该用户的最新打开,然后计算时间戳的差异。
我希望这会有所帮助。
答案 1 :(得分:0)
您可以尝试这样的事情:
SELECT t1.id, t1.user_id, t1.time_stamp, t1.event,
t2.id, t2.time_stamp, t2.event
FROM mytable AS t1
INNER JOIN mytable AS t2
ON t1.user_id = t2.user_id AND t1.event = 'push' AND t2.event = 'open' AND
t2.time_stamp > t1.time_stamp AND t2.time_stamp - t1.time_stamp < 20
LEFT JOIN mytable AS t3
ON t3.user_id = t2.user_id AND t3.event = 'open' AND
t3.time_stamp > t1.time_stamp AND t3.time_stamp < t2.time_stamp
WHERE t3.id IS NULL
<强>输出:强>
id, user_id, time_stamp, event, id, time_stamp, event
=====================================================
2, 1, 23, push, 3, 28, open
8, 2, 28, push, 10, 45, open
9, 2, 38, push, 10, 45, open
注意:如果您需要使用LEFT JOIN
过滤掉记录,则需要额外的id = 8
。