SUM(CASE WHEN ...)返回的数字大于COUNT(DISTINCT ..)

时间:2017-12-12 22:37:49

标签: postgresql aggregate-functions

我已经在两个模型中编写了一个查询,但我无法弄清楚为什么第二个查询返回的数字比第一个更大;而第一个,COUNT(DISTINCT ...)返回的数字是正确的:

    WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[])),
    date_gen64 AS
    (
        SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017', interval
          '1 day') AS date) as days        ORDER BY days)   

        SELECT cl.class_date AS c_date,
               count(DISTINCT (CASE WHEN co.id = 1 THEN  p.id END)), 
       count(DISTINCT (CASE WHEN co.id = 2 THEN  p.id END))
        FROM person p
             JOIN envelope e ON e.personID = p.id 
             JOIN "class" cl on cl.id = p.classID
             JOIN course co ON co.id = cl.course_id AND co.id = 1
             JOIN types ON cr.type_id = ANY (types.id) 
             RIGHT JOIN date_gen64 dg ON dg.days = cl.class_date
      GROUP BY cl.class_date
      ORDER BY cl.class_date

上面的查询返回26但是后面的查询返回27!     我用SUM重写它的原因是第一个查询 太慢了。但我的问题是为什么第二个更重要?

WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))    
SELECT tmpcl.days,
        SUM(CASE WHEN tmp80.course_id = 1 THEN 1
                                 ELSE 0     END), 
        SUM(CASE WHEN tmp80.course_id = 2 THEN 1
                                 ELSE 0     END)        
        FROM (
       SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
     interval     '1 day')   AS date) as days     ORDER BY days) tmpcl
       LEFT JOIN (
             SELECT  DISTINCT  p.id AS "person_id",
                    cl.class_date AS c_date,
                    co.id AS "course_id"                  
                    FROM person p
                    JOIN envelope e ON e.personID = p.id 
                    JOIN "class" cl on cl.id = p.classID
                    JOIN course co ON co.id = cl.course_id
                    JOIN types ON cr.type_id = ANY (types.id) 
                    WHERE co.id IN ( 1  , 2 )
                   ) tmp80 ON tmpcl.days = tmp80.class_date
      GROUP BY tmpcl.days
      ORDER BY tmpcl.days

1 个答案:

答案 0 :(得分:0)

理论上,您可以在同一天在同一个班级注册多个人。事实上,这似乎是拥有的要点。因此,每当有多个人在同一天分配到同一个班级时,您的计数可能会高于第一个查询中的计数。这有意义吗?

您似乎没有在该内部查询中使用p.id,因此只需删除它,您的计数就应该匹配。

WITH types(id) AS (VALUES('{1, 4, 5, 3}'::INTEGER[]))    
SELECT tmpcl.days,
        SUM(CASE WHEN tmp80.course_id = 1 THEN 1
                                 ELSE 0     END), 
        SUM(CASE WHEN tmp80.course_id = 2 THEN 1
                                 ELSE 0     END)        
        FROM (
       SELECT CAST (generate_series(date '10/1/2017', date '11/15/2017',
     interval     '1 day')   AS date) as days     ORDER BY days) tmpcl
       LEFT JOIN (
             SELECT DISTINCT cl.class_date AS c_date,
                    co.id AS "course_id"
                    FROM person p
                    JOIN envelope e ON e.personID = p.id 
                    JOIN "class" cl on cl.id = p.classID
                    JOIN course co ON co.id = cl.course_id
                    JOIN types ON cr.type_id = ANY (types.id) 
                    WHERE co.id IN ( 1  , 2 )
                   ) tmp80 ON tmpcl.days = tmp80.class_date
      GROUP BY tmpcl.days
      ORDER BY tmpcl.days