Question

我有一个查询，返回的数据如图所示；

def load_tester(path):
    with open(path) as f:
        data = json.load(f)
    print(data)
    return np.asarray(data)

我需要通过name | field | count_1 | count_2 | -----|-------|---------|---------| John | aaa | 3 | 3 | John | bbb | 3 | 3 | John | ccc | 3 | 3 | John | ddd | 1 | 1 | Dave | aaa | 3 | 3 | Dave | bbb | 3 | 3 | Dave | ccc | 3 | 3 | Dave | ddd | 3 | 3 | -----|-------|---------|---------|和count_1为count_2的计数来过滤此数据。在上述情况下，对于字段=3上的John来说，两个计数均不满足条件，因此查询ddd仅应返回Dave在其他字段上满足的其他条件。我该如何实现？

只要给定字段上的个人没有满足一次计数，就应该将他过滤掉。

Answer 1

如果我答对了，NOT EXISTS可能会对您有所帮助。

SELECT *
       FROM (<your query>) x
       WHERE NOT EXISTS (SELECT *
                                FROM (<your query) y
                                WHERE y.name = x.name
                                      AND (y.count_1 <> 3
                                           OR y.count_2 <> 3));

用您的查询替换<your query>，使您得到发布的结果（或使用CTE，但要注意，这可能会导致Postgres中的性能问题）。

也许有一个更优雅的解决方案，已经在查询中“捷径”，但是要找到这样的解决方案，则需要有关您的架构和当前查询的更多信息。

Answer 2

在hading子句中使用布尔聚合bool_and()来使名称符合条件：

style="width:4rem;"

您可以将以上内容用作子查询来过滤并返回原始行（如果需要）：

select name
from the_data
group by 1
having bool_and(count_1 = 3 and count_2 = 3)

 name 
------
 Dave
(1 row)

Answer 3

我想你想要

with t as (
      <your query here>
     )
select t.*
from (select t.*,
             count(*) filter (where count_1 <> 3) over (partition by name) as cnt_1_3,
             count(*) filter (where count_2 <> 3) over (partition by name) as cnt_2_3
      from t
     ) t
where cnt_1_3 = 0 and cnt_2_3 = 0;

如果您不想要原始行，我将进行汇总：

select name
from t
group by name
having min(count_1) = max(count_1) and min(count_1) = 3 and
       min(count_2) = max(count_2) and min(count_2) = 3;

或者您也可以这样表达：

having sum( (count_1 <> 3)::int ) = 0 and
       sum( (count_2 <> 3)::int ) = 0

请注意，以上所有条件均假定计数不是NULL（对于称为计数的东西来说似乎是合理的）。如果可能使用NULL值，则可以使用is distinct from安全比较（NULL）。

PostgreSQL过滤器按单个值分组

3 个答案: