查询返回错误计数

时间:2012-06-05 02:32:26

标签: count

我有以下查询。

SELECT a.link_field1 AS journo, count(a.link_id) as articles, AVG( b.vote_value ) AS score FROM dan_links a LEFT JOIN dan_votes b ON link_id = vote_link_id WHERE link_field1 <> '' and link_status NOT IN ('discard', 'spam', 'page') GROUP BY link_field1 ORDER BY link_field1, link_id

此查询为列表中的第一项返回3的计数。应该返回的是

Journo | count | score
John S | 2 | 6.00
Joe B | 1 | 4

然而,对于第一个约翰S,它返回3的计数。

如果我直接查询

select * from dan_links where link_field1 = 'John S' 

我按照预期得到了2条记录。我不能为我的生活弄清楚为什么计数是错误的,除非由于某种原因它计算来自dan_vote表的记录

如何获得正确的计数,或者我的查询完全错误?

编辑:表格的内容

dan_links

link_id | link_field1 | link | source | link_status
1 | John S | http://test.com | test.com | approved
2 | John S | http://google.com | google | approved
3 | Joe B | http://facebook.com | facebook | approved

dan_votes

vote_id | link_id | vote_value
1 | 1 | 5
2 | 1 | 8
3 | 2 | 4
4 | 3 | 1

编辑:由于某种原因,它似乎在计算投票表中的行数

1 个答案:

答案 0 :(得分:0)

当您使用条件link_id = vote_link_id为每个匹配记录执行左外连接时,会创建一行,例如

link_id | link_field1 | link | source | link_status|vote_id|vote_value
1 | John S | http://test.com | test.com | approved|1|5
1 | John S | http://test.com | test.com | approved|2|8
2 | John S | http://google.com | google | approved|3|4
3 | Joe B | http://facebook.com | facebook | approved|4|1

现在当您在link_field1上进行分组时,John S

的计数为3

嵌套查询可能有效

SELECT journo,count(linkid) as articles,AVG(score) FROM
(SELECT a.link_field1 AS journo, AVG( b.vote_value ) AS score, a.link_id as linkid 
FROM dan_links a 
LEFT JOIN dan_votes b 
ON link_id = vote_link_id 
WHERE link_field1 <> '' 
and link_status NOT IN ('discard', 'spam', 'page') 
GROUP BY link_id 
ORDER BY link_field1, link_id) GROUP BY journo

以上查询会给出错误的平均值((n1 + n2)/ 2 + n3)/ 2!=(n1 + n2 + n3)/ 3 ,因此请使用以下查询

SELECT journo,count(linkid) as articles, SUM(vote_sum)/SUM(count(linkid)) 
FROM
    (SELECT a.link_field1 AS journo, SUM( b.vote_value ) AS vote_sum, a.link_id as linkid, count(a.link_id) as count_on_id
    FROM dan_links a 
    LEFT JOIN dan_votes b 
    ON link_id = vote_link_id 
    WHERE link_field1 <> '' 
    and link_status NOT IN ('discard', 'spam', 'page') 
    GROUP BY link_id 
ORDER BY link_field1, link_id) GROUP BY journo

希望这有帮助。