Question

我在 postgressql 上遇到特定查询的问题。

看说明：

                          ->  Nested Loop Left Join  (cost=21547.86..87609.16 rows=123 width=69) (actual time=28.997..562.299 rows=32710 loops=1)
                                ->  Hash Join  (cost=21547.30..87210.72 rows=123 width=53) (actual time=28.913..74.682 rows=32710 loops=1)
                                      Hash Cond: (registry.id = profile.registry_id)
                                      ->  Bitmap Heap Scan on registry  (cost=726.99..66218.46 rows=65503 width=53) (actual time=5.123..32.794 rows=66496 loops=1)
                                            Recheck Cond: ((tenant_id = 1009469) AND active AND (excluded_at IS NULL))
                                            Heap Blocks: exact=12563
                                            ->  Bitmap Index Scan on registry_tenant_id_excluded_at  (cost=0.00..710.61 rows=65503 width=0) (actual time=3.589..3.589 rows=66496 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                      ->  Hash  (cost=20202.82..20202.82 rows=49399 width=16) (actual time=23.738..23.738 rows=32710 loops=1)
                                            Buckets: 65536  Batches: 1  Memory Usage: 2046kB
                                            ->  Index Only Scan using profile_tenant_id_registry_id on profile  (cost=0.56..20202.82 rows=49399 width=16) (actual time=0.019..19.173 rows=32710 loops=1)
                                                  Index Cond: (tenant_id = 1009469)
                                                  Heap Fetches: 29493

它错误地估计了哈希连接，即使两次扫描都是准确的。我已经尝试提高相关列的统计数据，但它只是从 117 估计到 123，所以我想这不是问题。

为什么误判这么难？嵌套循环需要大量的数据库工作。

Answer 1

看起来具有相同tenant_id 的行也大多具有相同的registry_id/registry.id 值。但策划者不明白这一点。它认为 registry_id=registry.id 对于实际选择的行和随机选择的行对一样频繁。

我认为您对此无能为力。

postgres 上的查询统计信息

1 个答案: