PostgreSQL - 为条件成立的行选择count(*)

时间:2014-11-03 17:07:22

标签: sql postgresql select count

我有下表和一些示例记录:

  id  | attr1_id | attr2_id |      user_id      | rating_id | override_comment
------+----------+----------+-------------------+-----------+------------------
 1    |      188 |      201 | user_1@domain.com |         3 |
 2    |      193 |      201 | user_2@domain.com |         2 |
 3    |      193 |      201 | user_2@domain.com |         1 |
 4    |      194 |      201 | user_2@domain.com |         1 |
 5    |      194 |      201 | user_1@domain.com |         1 |
 6    |      192 |      201 | user_2@domain.com |         1 |

attr1_idattr2_iduser_id)的组合为UNIQUE,这意味着每个用户只能创建一条具有特定属性ID的记录。

我的目标是计算rating_id = 1的行数,但只计算attr1_idattr2_id的每个组合只有一次,并且只计算不存在任何其他行的位置(由其他用户提供)rating_id > 1并引用相同的attr1_idattr2_id。 请注意,attr1_idattr2_id的组合可以切换,因此给出了以下两条记录:

  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         3 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |

不应计算任何行,因为行引用attr_ids的相同组合,其中一行有rating_id > 1

但是,如果存在这两行:

  id  | attr1_id | attr2_id |      user_id       | rating_id | override_comment
------+----------+----------+--------------------+-----------+------------------
  20  |       5  |       2  | user_1@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  21  |       2  |       5  | user_2@domain.com  |         1 |
------+----------+----------+--------------------+-----------+------------------
  22  |       2  |       5  | user_3@domain.com  |         1 |

所有行都应该只计为一行,因为它们都共享attr1_idattr2_id的相同组合,并且都有rating_id = 1

到目前为止,我的方法是这样,但它导致根本没有选择任何行。

SELECT *
FROM compatibility c
WHERE rating_id > 1
  AND NOT EXISTs
    (SELECT *
     FROM compatibility c2
     WHERE c.rating_id > 1
       AND (
             (c.attr1_id = c2.attr1_id) AND (c.attr2_id = c2.attr2_id)
             OR
             (c.attr1_id = c2.attr2_id) AND (c.attr2_id = c2.attr1_id)
           )
    )

我怎样才能做到这一点?

4 个答案:

答案 0 :(得分:2)

  

我的目标是计算rating_id = 1的行数,但仅限   只计算一次attr1_id和attr2_id的每个组合   哪里没有任何其他行(由其他用户)有rating_id> 1

以原始

为基础

您的原始查询是在正确的轨道上排除违规行。您刚刚>而不是=。计数的棘手步骤不见了。

SELECT count(*) AS ct
FROM  (
   SELECT 1
   FROM   compatibility c
   WHERE  rating_id = 1
   AND    NOT EXISTS (
      SELECT 1
      FROM   compatibility c2
      WHERE  c2.rating_id > 1
      AND   (c2.attr1_id = c.attr1_id AND c2.attr2_id = c.attr2_id OR
             c2.attr1_id = c.attr2_id AND c2.attr2_id = c.attr1_id))
   GROUP  BY least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
   ) sub;

更短的

也可能更快。

SELECT count(*) AS ct
FROM  (
   SELECT 1  -- selecting more columns for count only would be a waste
   FROM   compatibility
   GROUP  BY least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
   HAVING every(rating_id = 1)
   ) sub;

@Clodoaldo's query或此earlier answer with more explanation类似 every(rating_id = 1)not bool_or(rating_id > 1)更简单,但也排除了rating < 1 - 这对您的案例来说可能很好(甚至更好)。

MySQL 目前没有实现(标准SQL!)every()。由于您只想消除rating_id > 1,因此这个简单的表达式更符合您的要求并适用于两个RDBMS:

HAVING max(rating_id) = 1

最短

使用count(*)作为窗口聚合函数且没有子查询。

SELECT count(*) OVER () AS ct
FROM   compatibility
GROUP  BY least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
HAVING max(rating_id) = 1
LIMIT  1;

在聚合步骤之后应用窗口函数。在此基础上,我们在单个查询级别中完成两个聚合步骤:

  1. 折叠等效(atr1_id, atr2_id),不包括存在差异rating_id的行。
  2. 在整个集合上计算具有窗口函数的剩余行。
  3. LIMIT 1获得一行(所有行都相同) MySQL没有窗口功能。仅 Postgres 最短,不一定最快。

    SQL Fiddle. (在pg9.2上,因为pg9.3当前处于脱机状态。)

答案 1 :(得分:1)

如果我理解正确,您需要一对其评级始终为“1”的属性。

这应该为您提供属性:

select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
       min(rating_id) as minri, max(rating_id) as maxri
from compatibility c
group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
having min(rating_id) = 1 and max(rating_id) = 1;

要获得计数,只需将其用作子查询:

select count(*)
from (select least(attr1_id, attr2_id) as a1, greatest(attr1_id, attr2_id) as a2,
             min(rating_id) as minri, max(rating_id) as maxri
      from compatibility c
      group by least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
      having min(rating_id) = 1 and max(rating_id) = 1
     ) c

答案 2 :(得分:1)

在Postgresql中这样做。 SQLFiddle现在不能正常工作:

select count(*)
from (
    select least(attr1_id, attr2_id), greatest(attr1_id, attr2_id)
    from compatibility
    group by 1, 2
    having not bool_or(rating_id > 1)
) s
;
 count 
-------
     2
(1 row)

答案 3 :(得分:0)

我会使用CASE .. WHEN来重新排列属性,使得较小的属性始终是第一个,并且顺序就是那个。要遵循的示例查询..

SELECT attrSmall, 
       attrLarge,            
       MAX(rating_id) as ratingMax
  FROM (
   SELECT CASE WHEN c.attr1_id < c.attr2_id 
               THEN c.attr1_id 
               ELSE c.attr2_id END as attrSmall,
          CASE WHEN c.attr1_id < c.attr2_id 
               THEN c.attr2_id 
               ELSE c.attr1_id END as attrLarge,
          c.rating_id
    FROM compatibility c) as c1
  GROUP BY atrrSmall, attrLarge
  HAVING ratingMax = 1