选择匹配条件A和至少X匹配B的记录

时间:2016-03-29 14:04:58

标签: sql postgresql

我的数据表如下

|table_id|ref_table_id|is_used|      date         |url|
|--------+------------+-------+-------------------+---|
|1       |1           |       |                   |abc|
|2       |1           |       |2016-01-01 00:00:00|abc|
|3       |1           |0      |                   |abc|
|4       |1           |1      |                   |abc|
|5       |2           |       |                   |   |
|6       |2           |       |2016-01-01 00:00:00|abc|
|7       |2           |1      |                   |abc|
|8       |2           |1      |2016-01-01 00:00:00|abc|
|9       |2           |1      |2016-01-01 00:00:00|abc|
|10      |3           |       |                   |   |
|11      |3           |       |2016-01-01 00:00:00|abc|
|12      |3           |0      |                   |   |
|13      |3           |0      |                   |   |
|14      |3           |0      |2016-01-01 00:00:00|   |
|15      |3           |1      |2016-01-01 00:00:00|abc|
...
|int     |int         |boolean|timestamp          |varchar| 

很明显,is_used, date, url列中空值和填充值的组合没有规则。

现在我想要了解条件

ref_table_id
  • 至少有一行未使用为空date and url
  • 少于X行未使用已填充 date or url

该表有很多行(~7mil),分组的ref_table_id可以在50行到600k行之间。

我尝试创建这个选择,运行时间超过2秒。

select 
    distinct on (ref_table_id) t1.ref_table_id, 
    count(1) as my_count
from my_table t1 inner join (
        select distinct t2.ref_table_id from my_table t2
        where t2.is_used is not true -- null or false
            and t2.url is null 
            and t2.date is null 
        group by t2.ref_table_id
    ) tjoin on t1.ref_table_id = tjoin.ref_table_id
where t1.is_used is not true
    and (t1.date is not null
        or t1.url is not null)

group by t1.ref_table_id
having my_count < X
order by 1,2;

我可以使用INTERSECTVIEW或其他数据库功能重写它,以便它更快吗?

1 个答案:

答案 0 :(得分:1)

这听起来像是使用having子句聚合:

select ref_table_id
from my_table t
group by ref_table_id
having sum(case when is_used = 0 and date is null and url is null
                then 1 else 0 end) > 0 and
       sum(case when is_used = 0 and (date is not null or url is not null)
                then 1 else 0 end) >= N;

这明确检查is_used0的含义是&#34;未使用&#34;。我不确定空白代表什么,所以逻辑可能需要调整。

作为备注,您可以通过删除is_used上的常见条件来简化查询:

select ref_table_id
from my_table t
where is_used = 0  -- or is_used is NULL ??
group by ref_table_id
having sum(case when date is null and url is null
                then 1 else 0 end) > 0 and
       sum(case when (date is not null or url is not null)
                then 1 else 0 end) >= N;