我有一个旧论坛,其中包含重复的第一帖(也许回复不同)的主题。我要删除所有这些线程中的一个,但要删除其中一个(使线程的视图计数最高)。
我有以下SQL查询来帮助识别重复的线程,但是我找不到一种方法来仅列出xf_thread.view_count列中值最低的重复项:
SELECT
t.thread_id, MIN(t.view_count)
FROM
xf_thread t
INNER JOIN
xf_post p ON p.thread_id = t.thread_id
WHERE
t.first_post_id = p.post_id
GROUP BY
t.title,
t.username,
p.message
HAVING
COUNT(t.title) > 1
AND COUNT(t.username) > 1
AND COUNT(p.message) > 1;
此查询目前正确地对线程进行了分组,但它仅显示一个随机的thread_id-而不是与min(view_count)对应的thread_id。
我已经阅读了有关如何解决此问题的信息,但我不知道该如何实现-因为它看起来不可能按查询对行进行排序。
修改
借助Madhur的帮助,查询现在返回所有要删除的线程ID。但是,我可以弄清楚如何删除具有匹配thread_id的行。这是我尝试使用的查询(它一直在运行,而选择查询(https://stackoverflow.com/a/52314208/2469308)在几秒钟内运行:
DELETE FROM xf_thread
WHERE thread_id IN (SELECT Substring_index(Group_concat(DISTINCT t.thread_id
ORDER BY
t.view_count
ASC
SEPARATOR ','),
',', 1) AS
thread_id_with_minimum_views
FROM (SELECT *
FROM xf_thread) t
INNER JOIN xf_post p
ON p.thread_id = t.thread_id
WHERE t.first_post_id = p.post_id
AND t.user_id = 0
AND t.reply_count < 2
GROUP BY t.title,
t.username,
p.message
HAVING Count(t.title) > 1
AND Count(t.username) > 1
AND Count(p.message) > 1
ORDER BY t.thread_id);
答案 0 :(得分:1)
一个非常棘手的解决方案是在GROUP_CONCAT中获得按thread_id
排序的view_count
。然后,我们可以利用字符串操作以最低 thread_id
的形式获取view_count
。
在您的SELECT
子句中,您可以尝试以下操作,而不是t.thread_id
:
SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id
ORDER BY t.view_count ASC
SEPARATOR ','),
',',
1) AS thread_id_with_minimum_views
现在,基于SELECT
查询以识别具有最小视图的重复记录,DELETE
查询将从xf_thread
表中删除此类记录如下:
DELETE t_delete FROM xf_thread AS t_delete
INNER JOIN (SELECT CAST(SUBSTRING_INDEX(GROUP_CONCAT(DISTINCT t.thread_id ORDER BY t.view_count ASC SEPARATOR ','), ',', 1) AS UNSIGNED) AS tid_min_view
FROM (SELECT * FROM xf_thread) t
INNER JOIN xf_post p ON p.thread_id = t.thread_id
WHERE t.first_post_id = p.post_id
AND t.user_id = 0
AND t.reply_count < 2
GROUP BY t.title, t.username, p.message
HAVING Count(t.title) > 1
AND Count(t.username) > 1
AND Count(p.message) > 1
ORDER BY t.thread_id) AS t_dup
ON t_delete.thread_id = t_dup.tid_min_view