我的数据库中有四个不同的表:
主题
thread_rating:
thread_report:
thread_impression:
我将使用此SQL-Query
加入这些表SELECT t.thread_id,
t.thread_content,
SUM(tra.liked) AS liked,
SUM(tra.disliked) AS disliked,
t.timestamp,
((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN thread_rating AS tra ON t.thread_id = tra.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent
Query应该返回所有带有计算的喜欢和不喜欢的thread_ids,百分比的喜欢,时间戳,线程插入数据库的时间以及百分比的报告(时间,线程显示给用户)...
几乎所有的结果都是正确的,唯一不合适的结果就是喜欢和不喜欢。
如果我在查询前加上一个计数(*),我可以看到,正确的结果计数为1,错误的结果有时计数最多为60。 好像有交叉连接问题...
我认为这是分组的问题,或者我应该接受联接。
我见过带有子选择的解决方案。但我认为这不是解决这个问题的好方法......
我在这里做错了什么?
答案 0 :(得分:2)
tra
表每个thread_id有多个记录。这导致SUM
函数中出现双重计数
在子选择中进行求和,按连接字段分组
这样,您只需要tra2
中的一个thread_id加入,并且将避免重复行。
SELECT t.thread_id,
t.thread_content,
tra2.liked
tra2.disliked,
t.timestamp,
tra2.liked_percent,
((100*(COUNT(DISTINCT tre.thread_report_id)) / ((COUNT(DISTINCT ti.thread_impression_id))))) AS reported_percent
FROM thread AS t
LEFT JOIN (
SELECT
tra.thread_id
, SUM(tra.liked) AS liked
, SUM(tra.disliked) AS disliked
, ((100*(tra.liked + SUM(tra.liked))) / (tra.liked + SUM(tra.liked) + (tra.disliked + SUM(tra.disliked)))) AS liked_percent
FROM thread_rating AS tra
GROUP BY tra.thread_id
) as tra2 ON t.thread_id = tra2.thread_id
LEFT JOIN thread_report AS tre ON tra.thread_id = tre.thread_id
LEFT JOIN thread_impression AS ti ON tre.thread_id = ti.thread_id
GROUP BY t.thread_id
ORDER BY liked_percent DESC