我的数据表如下
|table_id|ref_table_id|is_used| date |url|
|--------+------------+-------+-------------------+---|
|1 |1 | | |abc|
|2 |1 | |2016-01-01 00:00:00|abc|
|3 |1 |0 | |abc|
|4 |1 |1 | |abc|
|5 |2 | | | |
|6 |2 | |2016-01-01 00:00:00|abc|
|7 |2 |1 | |abc|
|8 |2 |1 |2016-01-01 00:00:00|abc|
|9 |2 |1 |2016-01-01 00:00:00|abc|
|10 |3 | | | |
|11 |3 | |2016-01-01 00:00:00|abc|
|12 |3 |0 | | |
|13 |3 |0 | | |
|14 |3 |0 |2016-01-01 00:00:00| |
|15 |3 |1 |2016-01-01 00:00:00|abc|
...
|int |int |boolean|timestamp |varchar|
很明显,is_used, date, url
列中空值和填充值的组合没有规则。
现在我想要了解条件
的ref_table_id
date and url
date or url
该表有很多行(~7mil),分组的ref_table_id可以在50行到600k行之间。
我尝试创建这个选择,运行时间超过2秒。
select
distinct on (ref_table_id) t1.ref_table_id,
count(1) as my_count
from my_table t1 inner join (
select distinct t2.ref_table_id from my_table t2
where t2.is_used is not true -- null or false
and t2.url is null
and t2.date is null
group by t2.ref_table_id
) tjoin on t1.ref_table_id = tjoin.ref_table_id
where t1.is_used is not true
and (t1.date is not null
or t1.url is not null)
group by t1.ref_table_id
having my_count < X
order by 1,2;
我可以使用INTERSECT
,VIEW
或其他数据库功能重写它,以便它更快吗?
答案 0 :(得分:1)
这听起来像是使用having
子句聚合:
select ref_table_id
from my_table t
group by ref_table_id
having sum(case when is_used = 0 and date is null and url is null
then 1 else 0 end) > 0 and
sum(case when is_used = 0 and (date is not null or url is not null)
then 1 else 0 end) >= N;
这明确检查is_used
是0
的含义是&#34;未使用&#34;。我不确定空白代表什么,所以逻辑可能需要调整。
作为备注,您可以通过删除is_used
上的常见条件来简化查询:
select ref_table_id
from my_table t
where is_used = 0 -- or is_used is NULL ??
group by ref_table_id
having sum(case when date is null and url is null
then 1 else 0 end) > 0 and
sum(case when (date is not null or url is not null)
then 1 else 0 end) >= N;