该项目正在使用Postgres 9.3
我有表格(我简化了)如下:
t_person (30 million records)
- id
- first_name
- last_name
- gender
t_city (70,000 records)
- id
- name
- country_id
t_country (20 records)
- id
- name
t_last_city_visited (over 200 million records)
- person_id
- city_id
- country_id
- There is a unique constraint on person_id, country_id to
ensure that each person only has one last city per country
我需要做的是以下内容的变体:
获取访问过国家/地区的女性的ids' UK' 但从未访问过国家' USA'
我尝试了以下内容,但速度太慢了。
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'UK')
)
where gender = 'female'
except
(
select t_person.id from t_person
join t_last_city_visited
on (
t_last_city_visited.person_id = t_person.id
and country_id = (select id from t_country where name = 'USA')
)
)
我真的很感激任何帮助。
答案 0 :(得分:3)
提示:你想在这里做的是找到那些有访问英国的女性,但不会访问美国。
类似的东西:
select ...
from t_person
where ...
and exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'UK')
and not exists (select null
from t_last_city_visited join
t_country on (...)
where t_country.name = 'US')
另一种方法是找到访问过英国而不是美国的人,然后您可以加入人们按性别进行过滤:
select person_id
from t_last_city_visited join
t_country on t_last_city_visited.country_id = t_country.id
where t_country.name in ('US','UK')
group by person_id
having max(t_country.name) = 'UK'
答案 1 :(得分:0)
请问您可以运行分析并执行此查询吗?
-- females who visited UK
with uk_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'UK')
),
-- females who visited US
us_person as (
select distinct person_id
from t_last_city_visited t
inner join t_person p on t.person_id = p.id and 'F' = p.gender
where country_id = (select id from t_country where name = 'US')
)
-- females who visited UK but not US
select uk.person_id
from uk_person uk
left join us_person us on uk.person_id = us.person_id
where us.person_id is null
这是可以形成此查询的众多方法之一。您可能必须运行它们以找出哪个最有效并且可能需要进行索引调整以使它们运行得更快。
答案 2 :(得分:0)
这是我接近它的方式,你可以稍后用别名替换内部查询,如@zedfoxus所说
select
id
from
(SELECT
p.id id
FROM
t_person p JOIN t_last_city_visited lcv
ON(lcv.person_id = p.id)
JOIN country c
ON(lcv.country_id = c.id and cname = 'UK')
WHERE
p.gender = 'female') v JOIN
(SELECT
p2.id id
FROM
t_person p2 JOIN t_last_city_visited lcv2
ON(lcv2.person_id = p2.id)
JOIN country c
ON(lcv.country_id = c.id and cname != 'USA')
WHERE
p.gender = 'female') nv
ON(v.id = nv.id)