我正在为日志系统创建查询。该表包含100,000行左右,我想删除以下列的重复项,只返回最新的条目。
要避免重复的列,
这样做的目的是查看用户访问过的网站部分。我们不需要知道他们已经访问了100次特定页面,我们只需要知道他们曾访问过一次。
该表格包含以下列
我的问题是,为什么以下查询会返回如此剧烈的结果? MAX('id')查询返回450个结果,MAX(time_accessed
)返回835.它们不应该返回相同的ammount吗?
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `id` IN (SELECT MAX(id) AS id
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
SELECT DISTINCT mainlocation, secondlocation, thirdlocation, ip, user, did_user_have_access, time_accessed
FROM `log_table`
WHERE `time_accessed` IN (SELECT MAX(`time_accessed`) AS time_accessed
FROM `log_table`
GROUP BY `mainlocation`, `secondlocation`, `thirdlocation`, `ip`, `user`, `did_user_have_access`)
ORDER BY `log_table`.`time_accessed` DESC;
答案 0 :(得分:0)
在不知道你如何填充你正在应用MAX()的两个字段的情况下 - 它很难回答,也许只是猜测,或许。
虽然......这有关系吗?如果你得到了正确的结果 - 是吗?
然后,您不必将其拆分为两个查询 - 因为您要将结果完全按照您希望被删除的字段进行分组,您可以保证与MAX一起使用唯一组合主查询中的():
SELECT DISTINCT mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access,
MAX(`time_accessed`) AS last_accessed
FROM log_table
GROUP BY mainlocation, secondlocation, thirdlocation,
ip, user, did_user_have_access
换句话说,每个六元组对于每个last_accesed