将表连接到自身时,sql计数搞乱了

时间:2012-08-22 15:24:53

标签: sql sql-server-2008

this fiddle。我有一个问题表,每个问题都属于一个类别,我必须找到每个类别的用户平均值。我认为它工作正常,但我想添加一个总数,显示每个用户的平均值中包含的答案总数。我无法弄清楚在where子句中放什么来实际返回每个用户的问题总数。无论我是否包含用户ID,QID或选择,它都会给我天文数字。

查询SQL:

DECLARE @tblTmpCatStats TABLE (userid NVARCHAR(10),cat1_mean FLOAT,cat2_mean FLOAT,cat3_mean FLOAT,cat4_mean FLOAT,N FLOAT)
INSERT INTO @tblTmpCatStats SELECT d.userid
    ,AVG(CAST(c1.choice AS FLOAT))
    ,AVG(CAST(c2.choice AS FLOAT))
    ,AVG(CAST(c3.choice AS FLOAT))
    ,AVG(CAST(c4.choice AS FLOAT))
    ,COUNT(d.userid)
FROM tblTmpDemographics d
JOIN tblTmpDemographics c1 ON d.userid = c1.userid
JOIN tblTmpDemographics c2 ON d.userid = c2.userid
JOIN tblTmpDemographics c3 ON d.userid = c3.userid
JOIN tblTmpDemographics c4 ON d.userid = c4.userid
WHERE c1.QID IN ('1','5')
AND c2.QID IN ('2','6')
AND c3.QID IN ('3','7')
AND c4.QID IN ('4','8')
GROUP BY d.userid
SELECT * FROM @tblTmpCatStats

我正在努力使N eqaul成为AVG中包含的总数选择

设置SQL:

CREATE TABLE tblTmpDemographics (userid NVARCHAR(10),QID INT,choice NVARCHAR(1000))
INSERT INTO tblTmpDemographics (userid,QID,choice)
SELECT 'user1',1,'5' UNION ALL SELECT 'user1',2,'3' UNION ALL
SELECT 'user1',3,'4' UNION ALL SELECT 'user1',4,'5' UNION ALL
SELECT 'user1',5,'5' UNION ALL SELECT 'user1',6,'3' UNION ALL
SELECT 'user1',7,'4' UNION ALL SELECT 'user1',8,'5' UNION ALL

SELECT 'user2',1,'3' UNION ALL SELECT 'user2',2,'2' UNION ALL
SELECT 'user2',3,'3' UNION ALL SELECT 'user2',4,'5' UNION ALL
SELECT 'user2',5,'3' UNION ALL SELECT 'user2',6,'2' UNION ALL
SELECT 'user2',7,'3' UNION ALL SELECT 'user2',8,'5' UNION ALL

SELECT 'user3',1,'2' UNION ALL SELECT 'user3',2,'1' UNION ALL
SELECT 'user3',3,'5' UNION ALL SELECT 'user3',4,'5' UNION ALL
SELECT 'user3',5,'2' UNION ALL SELECT 'user3',6,'1' UNION ALL
SELECT 'user3',7,'5' UNION ALL SELECT 'user3',8,'5' UNION ALL

SELECT 'user4',1,'4' UNION ALL SELECT 'user4',2,'3' UNION ALL
SELECT 'user4',3,'3' UNION ALL SELECT 'user4',4,'5' UNION ALL
SELECT 'user4',5,'4' UNION ALL SELECT 'user4',6,'3' UNION ALL
SELECT 'user4',7,'3' UNION ALL SELECT 'user4',8,'5' GO

为什么它返回128而不是8?

4 个答案:

答案 0 :(得分:2)

试试这个:

SELECT d.userid
    ,AVG(CAST(c1.choice AS FLOAT))
    ,AVG(CAST(c2.choice AS FLOAT))
    ,AVG(CAST(c3.choice AS FLOAT))
    ,AVG(CAST(c4.choice AS FLOAT))
    , d.cnt
FROM
(
  SELECT userid, count(*) cnt
  from tblTmpDemographics
  group by userid
) d
INNER JOIN tblTmpDemographics c1 
  ON d.userid = c1.userid
INNER JOIN tblTmpDemographics c2 
  ON d.userid = c2.userid
INNER JOIN tblTmpDemographics c3 
  ON d.userid = c3.userid
INNER JOIN tblTmpDemographics c4 
  ON d.userid = c4.userid
WHERE c1.QID IN ('1','5')
  AND c2.QID IN ('2','6')
  AND c3.QID IN ('3','7')
  AND c4.QID IN ('4','8')
GROUP BY d.userid,  d.cnt

请参阅SQL Fiddle with Demo

答案 1 :(得分:2)

您选择获取结果的方法无法为您带来正确的计数,因为所有连接,即使它们被进一步过滤,可能(并且其中一些最终确实)会导致每行多个匹配,反过来,它会在中间结果集中产生迷你 - Cartesian products,最终聚合。

@bluefeet的建议是有效的,因为计数是单独计算的,但它仍然不能解决笛卡尔积效应。您的平均值仅仅是因为它们是平均值而不是计数总和。当然,它们当然是总和除以计数,并且因为两个操作数都是相同的因素,所以无论笛卡尔积的影响如何,您的平均值最终都是正确的。但是,如果您在choice值上尝试使用SUM或COUNT,则会再次看到错误的结果。

您可以使用条件聚合,如下所示:

SELECT
  userid,
  cat1_mean = AVG(CASE WHEN QID IN (1, 5) THEN CAST(choice AS float) END),
  cat2_mean = AVG(CASE WHEN QID IN (2, 6) THEN CAST(choice AS float) END),
  cat3_mean = AVG(CASE WHEN QID IN (3, 7) THEN CAST(choice AS float) END),
  cat4_mean = AVG(CASE WHEN QID IN (4, 8) THEN CAST(choice AS float) END),
  N = COUNT(*)
FROM tblTmpDemographics
GROUP BY userid
;

或者您可以使用SQL Server的PIVOT功能,如下所示:

SELECT
  userid,
  cat1_mean,
  cat2_mean,
  cat3_mean,
  cat4_mean,
  N
FROM (
  SELECT
    userid,
    choice = CAST(choice AS float),
    QuestionGroup = CASE
      WHEN QID IN (1, 5) THEN 'cat1_mean'
      WHEN QID IN (2, 6) THEN 'cat2_mean'
      WHEN QID IN (3, 7) THEN 'cat3_mean'
      WHEN QID IN (4, 8) THEN 'cat4_mean'
    END,
    N = COUNT(*) OVER (PARTITION BY userid)
  FROM tblTmpDemographics
) s
PIVOT (
  AVG(choice) FOR QuestionGroup IN (
    cat1_mean,
    cat2_mean,
    cat3_mean,
    cat4_mean
  )
) p
;

或者像这样(与之前相同,但使用公用表表达式):

WITH marked AS (
  SELECT
    userid,
    choice = CAST(choice AS float),
    QuestionGroup = CASE
      WHEN QID IN (1, 5) THEN 'cat1_mean'
      WHEN QID IN (2, 6) THEN 'cat2_mean'
      WHEN QID IN (3, 7) THEN 'cat3_mean'
      WHEN QID IN (4, 8) THEN 'cat4_mean'
    END,
    N = COUNT(*) OVER (PARTITION BY userid)
  FROM tblTmpDemographics
)
SELECT
  userid,
  cat1_mean,
  cat2_mean,
  cat3_mean,
  cat4_mean,
  N
FROM marked
PIVOT (
  AVG(choice) FOR QuestionGroup IN (
    cat1_mean,
    cat2_mean,
    cat3_mean,
    cat4_mean
  )
) p
;

这两种方法都可以在SQL Fiddle中进行测试和播放:

答案 2 :(得分:1)

这是实现它的一种稍微粗鲁的方式,但改变了

COUNT(d.userID)

COUNT(distinct d.qid)

为每个用户提供8的计数。

答案 3 :(得分:0)

select userid, count(userid) cnt from tblTmpDemographics group by userid

这显示8 - 您必须运行两次插入。