EXPLAIN ANALYZE
SELECT count(*)
FROM "businesses"
WHERE (
source = 'facebook'
OR EXISTS(
SELECT *
FROM provider_business_map pbm
WHERE
pbm.hotstepper_business_id=businesses.id
AND pbm.provider_name='facebook'
)
);
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=233538965.74..233538965.75 rows=1 width=0) (actual time=116169.720..116169.721 rows=1 loops=1)
-> Seq Scan on businesses (cost=0.00..233521096.48 rows=7147706 width=0) (actual time=11.284..116165.646 rows=3693 loops=1)
Filter: (((source)::text = 'facebook'::text) OR (alternatives: SubPlan 1 or hashed SubPlan 2))
SubPlan 1
-> Index Scan using idx_provider_hotstepper_business on provider_business_map pbm (cost=0.00..16.29 rows=1 width=0) (never executed)
Index Cond: (((provider_name)::text = 'facebook'::text) AND (hotstepper_business_id = businesses.id))
SubPlan 2
-> Index Scan using idx_provider_hotstepper_business on provider_business_map pbm (cost=0.00..16.28 rows=1 width=4) (actual time=0.045..5.685 rows=3858 loops=1)
Index Cond: ((provider_name)::text = 'facebook'::text)
Total runtime: 116169.820 ms
(10 rows)
此查询需要一分钟时间,并且计数结果为~3000。似乎瓶颈是顺序扫描,但我不确定在数据库中需要什么索引来优化它。同样值得注意的是,我还没有调整过postgres,所以如果有任何调整可能有助于它值得考虑。虽然我的数据库是15GB而且我不打算在不久的将来把所有内容都安装到内存中,所以我不确定更改RAM相关的值会有多大帮助。
答案 0 :(得分:2)
OR因糟糕的表现而臭名昭着。尝试将它拆分为两个表上两个完全独立的查询的并集:
SELECT COUNT(*) FROM (
SELECT id
FROM businesses
WHERE source = 'facebook'
UNION -- union makes the ids unique in the result
SELECT hotstepper_business_id
FROM provider_business_map
WHERE provider_name = 'facebook'
AND hotstepper_business_id IS NOT NULL
) x
如果hotstepper_business_id
不能为空,则可以删除该行
AND hotstepper_business_id IS NOT NULL
如果您想要整个业务行,您可以使用IN (...)
简单地包含上述查询:
SELECT * FROM businesses
WHERE ID IN (
-- above inner query
)
但一个性能要好得多的查询就是修改上面的查询使用一个join:
SELECT *
FROM businesses
WHERE source = 'facebook'
UNION
SELECT b.*
FROM provider_business_map m
JOIN businesses b
ON b.id = m.hotstepper_business_id
WHERE provider_name = 'facebook'
答案 1 :(得分:1)
我至少尝试将依赖子查询重写为;
SELECT COUNT(DISTINCT b.*)
FROM businesses b
LEFT JOIN provider_business_map pbm
ON b.id=pbm.hotstepper_business_id
WHERE b.source = 'facebook'
OR pbm.provider_name = 'facebook';
除非我读错了某些内容,否则会存在businesses.id
上的索引,但要确保provider_business_map.hotstepper_business_id
,businesses.source
和provider_business_map.provider_name
上还有索引才能获得最佳效果
答案 2 :(得分:1)
create index index_name on businesses(source);
由于超过700万行中有3,693行匹配,因此可能会使用该索引。别忘了
analyse businesses;