我需要在与时间相关的表中查找和合并记录。该表记录了网站中的用户活动(活动开始和活动结束时间)。
我正在尝试将同一用户在其他活动的一小时内将任何活动合并为一条记录。因此,如果一条记录的开始时间是同一用户上一次活动结束后的55分钟,那么我将其合并成一条记录。
我尝试了各种自我连接来实现这一目标,但是结果却永远都不完美。
在两个步骤中,我已经尝试过:
首先更新updated_at(活动结束),以便彼此在一小时内的所有记录具有相同的updated_at时间戳,这是该组中的最新时间戳。
删除该组中所有后来的记录,以便仅保留最早的记录,现在具有最早的created_at和最新的updated_at
-首先为一个用户的所有活动设置一个公共结束时间(updated_at),间隔时间少于一个小时
UPDATE users_activity
SET updated_at = (SELECT a.LatestEnd FROM (SELECT
UA1.id,
MAX(UA2.updated_at) AS LatestEnd
FROM users_activity UA1, users_activity UA2
WHERE
UA1.id <> UA2.id
AND UA1.user_id = UA2.user_id
AND UA1.created_at > DATE_SUB(UA2.updated_at,INTERVAL 1 HOUR)
AND UA1.created_at < UA2.updated_at
) a)
WHERE
users_activity.id IN (SELECT b.id FROM (SELECT
UA1.id
FROM users_activity UA1, users_activity UA2
WHERE
UA1.id <> UA2.id
AND UA1.user_id = UA2.user_id
AND UA1.created_at > DATE_SUB(UA2.updated_at,INTERVAL 1 HOUR)
AND UA1.created_at < UA2.updated_at
) b);
-- next delete all the later records in the group, leaving only the earliest
DELETE FROM users_activity
WHERE
users_activity.id IN (SELECT * FROM (SELECT d.id FROM users_activity d
INNER JOIN
(SELECT
COUNT(CONCAT(user_id,'_',updated_at)) AS Duplicates,
CONCAT(user_id,'_',updated_at) AS UserVisitEnd,
id,
user_id,
MAX(created_at) AS LatestStart
FROM users_activity
GROUP BY UserVisitEnd
HAVING Duplicates > 1) a on a.LatestStart = d.created_at AND a.user_id = d.user_id) as AllDupes);
如果数据是这样的:
|id |user_id|created_at |updated_at
|5788|1222 |2019-06-06 08:55:28|2019-06-06 09:30:41
|5787|3555 |2019-06-06 08:40:04|2019-06-06 11:07:21
|5786|1222 |2019-06-06 07:11:03|2019-06-06 08:01:29
|5785|7999 |2019-06-05 18:11:03|2019-05-01 18:17:44
|5784|3555 |2019-06-04 16:53:32|2019-06-04 16:58:19
|5783|9222 |2019-04-01 15:21:32|2019-04-01 16:53:32
|5782|1222 |2019-03-29 14:02:09|2019-03-29 15:51:07
|5774|1222 |2019-03-29 13:38:43|2019-03-29 13:50:43
|5773|7999 |2018-09-23 17:38:35|2018-09-23 17:40:35
我应该得到以下结果:
|id |user_id|created_at |updated_at
|5787|3555 |2019-06-06 08:40:04|2019-06-06 11:07:21
|5786|1222 |2019-06-06 07:11:03|2019-06-06 09:30:41
|5785|7999 |2019-06-05 18:11:03|2019-05-01 18:17:44
|5784|3555 |2019-06-04 16:53:32|2019-06-04 16:58:19
|5783|9222 |2019-04-01 15:21:32|2019-04-01 16:53:32
|5774|1222 |2019-03-29 13:38:43|2019-03-29 15:51:07
|5773|7999 |2018-09-23 17:38:35|2018-09-23 17:40:35
新信息。该查询将为我提供包含所需信息的结果:要更新和合并的会话ID。但是,当每一行的更新可能会更改其他行所需的更新时,如何进行批量更新?
SELECT b.id, b.user_id, b.created_at, b.updated_at, b.UpdatedAtOfSessionToMerge, b.IDofSessionToMerge FROM (SELECT
UA1.id,
UA1.user_id,
UA1.created_at,
UA1.updated_at,
UA2.updated_at AS UpdatedAtOfSessionToMerge,
UA2.id AS IDofSessionToMerge
FROM users_activity UA1, users_activity UA2
WHERE
UA1.id <> UA2.id
AND UA1.user_id = UA2.user_id
AND UA1.created_at > DATE_SUB(UA2.updated_at,INTERVAL 1 HOUR)
AND UA1.updated_at < UA2.updated_at
AND UA1.created_at < UA2.updated_at
) b order by b.user_id;
答案 0 :(得分:0)
SELECT min(ID) as ID, User_ID, Min(Created_At) Created_At, Max(Updated_At) as Updated_At
FROM Table
GROUP BY User_ID, DATE_FORMAT(Created_At, "%Y%m%d%H");
会很近,但是我不确定我是否按照您想要的方式处理“小时”汇总。
答案 1 :(得分:0)
您可以根据参数将日期分组。另外,如果可以的话,就将来的处理速度而言,订购数据总是好的。这也使您的查询结果更好。
SELECT min(ID) as ID, User_ID, Min(Created_At) Created_At, Max(Updated_At) as Updated_At
从表 GROUP BY User_ID, ORDER BY User_ID;
答案 2 :(得分:0)
这是一个手动解决方案,足以一次性清理旧的会话数据。它使用两个SELF联接,因此可以有一种更有效的方法。
第1步,查找所有会话记录并通过为它们赋予相同的会话结束值(updated_at)来统一它们
UPDATE users_activity as u1 JOIN (SELECT b.id, b.user_id, b.created_at, b.updated_at, b.UpdatedAtOfSessionToMerge, b.IDofSessionToMerge FROM (SELECT
UA1.id,
UA1.user_id,
UA1.created_at,
UA1.updated_at,
UA2.updated_at AS UpdatedAtOfSessionToMerge,
UA2.id AS IDofSessionToMerge
FROM users_activity UA1, users_activity UA2
WHERE
UA1.id <> UA2.id
AND UA1.user_id = UA2.user_id
AND UA1.created_at > DATE_SUB(UA2.updated_at,INTERVAL 1 HOUR)
AND UA1.updated_at < UA2.updated_at
AND UA1.created_at < UA2.updated_at
) b order by b.user_id) as u2
on u1.id = u2.id
SET u1.updated_at = u2.UpdatedAtOfSessionToMerge;
重复此查询,直到不影响任何行
步骤2 ,删除每个统一批次中不必要的会话记录;
DELETE FROM users_activity
WHERE
users_activity.id IN (SELECT * FROM (SELECT d.id FROM users_activity d
INNER JOIN
(SELECT
COUNT(CONCAT(user_id,'_',updated_at)) AS Duplicates,
CONCAT(user_id,'_',updated_at) AS UserVisitEnd,
id,
user_id,
MAX(created_at) AS LatestStart
FROM users_activity
GROUP BY UserVisitEnd
HAVING Duplicates > 1) a on a.LatestStart = d.created_at AND a.user_id = d.user_id) as AllDupes);
重复此查询,直到不影响任何行