PostgreSql 9.2 Join Aggregate需要很长时间

时间:2013-12-17 11:31:08

标签: sql postgresql join

早上好我在JOIN每个表中都有一些表格,我不明白为什么我们会在聚合上浪费时间;查询是:

EXPLAIN ANALYZE SELECT COUNT(handsData.id) as hands 

FROM 
  stats stats JOIN heronames players on (stats.heronameid = players.heronameid) 
  JOIN PREFLOP preflopStats on preflopStats.preflopstatsid = stats.preflopstatsid 
  JOIN FLOP flopStats on flopStats.flopstatsid = stats.flopstatsid 
  JOIN TURN turnStats on turnStats.turnstatsid = stats.turnstatsid 
  JOIN RIVER riverStats on riverStats.riverstatsid = stats.riverstatsid 
  JOIN hands AS handsData on handsData.id = stats.handid 
  JOIN (SELECT tournamentid FROM tournaments WHERE tournamentcode IS NULL) AS smallTournaments ON handsData.tournamentid = smallTournaments.tournamentid 
  JOIN (SELECT pokertypeid AS ptid, pokertype AS pokertypeName FROM poker_types) AS poker_types ON handsData.pokerType = poker_types.ptid 

WHERE players.heroName='RubenGrimes' AND players.networkNumber=2

在stats,flopstats,turnstats和riverstas表格中我们有大约700k的记录。

手表上有大约120k的记录。

在其他表上有一些k记录。

"Aggregate  (cost=5219.65..5219.66 rows=1 width=8) (actual time=2080.230..2080.230 rows=1 loops=1)"
"  ->  Nested Loop  (cost=10.47..5219.31 rows=138 width=8) (actual time=20.190..2065.578 rows=119755 loops=1)"
"        Join Filter: (handsdata.pokertype = (public.poker_types.pokertypeid)::double precision)"
"        Rows Removed by Join Filter: 239510"
"        ->  Nested Loop  (cost=10.47..5211.03 rows=138 width=16) (actual time=20.169..1953.894 rows=119755 loops=1)"
"              ->  Nested Loop  (cost=10.47..4235.67 rows=138 width=20) (actual time=20.145..1641.112 rows=119937 loops=1)"
"                    ->  Nested Loop  (cost=10.47..3260.20 rows=138 width=4) (actual time=20.134..1315.970 rows=119937 loops=1)"
"                          ->  Nested Loop  (cost=10.47..2732.02 rows=138 width=8) (actual time=20.100..1047.149 rows=119937 loops=1)"
"                                ->  Nested Loop  (cost=10.47..2203.84 rows=138 width=12) (actual time=20.065..774.187 rows=119937 loops=1)"
"                                      ->  Nested Loop  (cost=10.47..1675.66 rows=138 width=16) (actual time=20.027..496.012 rows=119937 loops=1)"
"                                            ->  Nested Loop  (cost=10.47..1147.48 rows=138 width=20) (actual time=19.983..183.725 rows=119937 loops=1)"
"                                                  ->  Seq Scan on heronames players  (cost=0.00..226.41 rows=1 width=4) (actual time=1.669..1.682 rows=1 loops=1)"
"                                                        Filter: ((heroname = 'RubenGrimes'::text) AND (networknumber = 2))"
"                                                        Rows Removed by Filter: 9826"
"                                                  ->  Bitmap Heap Scan on stats  (cost=10.47..918.63 rows=244 width=24) (actual time=18.308..143.084 rows=119937 loops=1)"
"                                                        Recheck Cond: (heronameid = players.heronameid)"
"                                                        ->  Bitmap Index Scan on "stats index"  (cost=0.00..10.41 rows=244 width=0) (actual time=15.829..15.829 rows=119937 loops=1)"
"                                                              Index Cond: (heronameid = players.heronameid)"
"                                            ->  Index Only Scan using "preflop index" on preflop preflopstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=119937)"
"                                                  Index Cond: (preflopstatsid = stats.preflopstatsid)"
"                                                  Heap Fetches: 0"
"                                      ->  Index Only Scan using "flop index" on flop flopstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=119937)"
"                                            Index Cond: (flopstatsid = stats.flopstatsid)"
"                                            Heap Fetches: 0"
"                                ->  Index Only Scan using "turn index" on turn turnstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=119937)"
"                                      Index Cond: (turnstatsid = stats.turnstatsid)"
"                                      Heap Fetches: 0"
"                          ->  Index Only Scan using "river index" on river riverstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=119937)"
"                                Index Cond: (riverstatsid = stats.riverstatsid)"
"                                Heap Fetches: 0"
"                    ->  Index Scan using hands_pkey on hands handsdata  (cost=0.00..7.06 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=119937)"
"                          Index Cond: (id = stats.handid)"
"              ->  Index Scan using "tournaments index" on tournaments  (cost=0.00..7.06 rows=1 width=8) (actual time=0.002..0.002 rows=1 loops=119937)"
"                    Index Cond: (tournamentid = handsdata.tournamentid)"
"                    Filter: (tournamentcode IS NULL)"
"                    Rows Removed by Filter: 0"
"        ->  Materialize  (cost=0.00..1.04 rows=3 width=4) (actual time=0.000..0.000 rows=3 loops=119755)"
"              ->  Seq Scan on poker_types  (cost=0.00..1.03 rows=3 width=4) (actual time=0.010..0.010 rows=3 loops=1)"

任何改善我们加入的建议?

提前致谢

编辑:

我添加了两个缺失的索引,现在:

"Aggregate  (cost=4537.05..4537.06 rows=1 width=8) (actual time=4726.083..4726.083 rows=1 loops=1)"
"  ->  Nested Loop  (cost=10.47..4536.70 rows=138 width=8) (actual time=41.193..4692.321 rows=119755 loops=1)"
"        Join Filter: (handsdata.pokertype = (public.poker_types.pokertypeid)::double precision)"
"        Rows Removed by Join Filter: 239510"
"        ->  Nested Loop  (cost=10.47..4528.42 rows=138 width=16) (actual time=41.144..4438.428 rows=119755 loops=1)"
"              ->  Nested Loop  (cost=10.47..4017.53 rows=138 width=20) (actual time=41.100..3656.244 rows=119937 loops=1)"
"                    ->  Nested Loop  (cost=10.47..3042.06 rows=138 width=4) (actual time=41.081..2925.372 rows=119937 loops=1)"
"                          ->  Nested Loop  (cost=10.47..2513.88 rows=138 width=8) (actual time=41.064..2318.648 rows=119937 loops=1)"
"                                ->  Nested Loop  (cost=10.47..1985.70 rows=138 width=12) (actual time=41.046..1708.789 rows=119937 loops=1)"
"                                      ->  Nested Loop  (cost=10.47..1457.52 rows=138 width=16) (actual time=41.028..1084.847 rows=119937 loops=1)"
"                                            ->  Nested Loop  (cost=10.47..929.34 rows=138 width=20) (actual time=40.982..388.839 rows=119937 loops=1)"
"                                                  ->  Index Scan using "big index" on heronames players  (cost=0.00..8.27 rows=1 width=4) (actual time=0.078..0.079 rows=1 loops=1)"
"                                                        Index Cond: ((networknumber = 2) AND (heroname = 'RubenGrimes'::text))"
"                                                  ->  Bitmap Heap Scan on stats  (cost=10.47..918.63 rows=244 width=24) (actual time=40.889..301.217 rows=119937 loops=1)"
"                                                        Recheck Cond: (heronameid = players.heronameid)"
"                                                        ->  Bitmap Index Scan on "stats index"  (cost=0.00..10.41 rows=244 width=0) (actual time=35.179..35.179 rows=119937 loops=1)"
"                                                              Index Cond: (heronameid = players.heronameid)"
"                                            ->  Index Only Scan using "preflop index" on preflop preflopstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.005..0.005 rows=1 loops=119937)"
"                                                  Index Cond: (preflopstatsid = stats.preflopstatsid)"
"                                                  Heap Fetches: 0"
"                                      ->  Index Only Scan using "flop index" on flop flopstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=119937)"
"                                            Index Cond: (flopstatsid = stats.flopstatsid)"
"                                            Heap Fetches: 0"
"                                ->  Index Only Scan using "turn index" on turn turnstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=119937)"
"                                      Index Cond: (turnstatsid = stats.turnstatsid)"
"                                      Heap Fetches: 0"
"                          ->  Index Only Scan using "river index" on river riverstats  (cost=0.00..3.82 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=119937)"
"                                Index Cond: (riverstatsid = stats.riverstatsid)"
"                                Heap Fetches: 0"
"                    ->  Index Scan using hands_pkey on hands handsdata  (cost=0.00..7.06 rows=1 width=20) (actual time=0.005..0.005 rows=1 loops=119937)"
"                          Index Cond: (id = stats.handid)"
"              ->  Index Only Scan using "tournament big index" on tournaments  (cost=0.00..3.69 rows=1 width=8) (actual time=0.005..0.006 rows=1 loops=119937)"
"                    Index Cond: ((tournamentcode IS NULL) AND (tournamentid = handsdata.tournamentid))"
"                    Heap Fetches: 0"
"        ->  Materialize  (cost=0.00..1.04 rows=3 width=4) (actual time=0.000..0.001 rows=3 loops=119755)"
"              ->  Seq Scan on poker_types  (cost=0.00..1.03 rows=3 width=4) (actual time=0.029..0.031 rows=3 loops=1)"
"Total runtime: 4726.684 ms"

我认为真正的问题是:

->  Bitmap Index Scan on "stats index"  (cost=0.00..10.41 rows=244 width=0) (actual time=35.179..35.179 rows=119937 loops=1)"
    "                                                              Index Cond: (heronameid = players.heronameid)"

1 个答案:

答案 0 :(得分:0)

您计划中突出的事项包括:

  • handsdata.pockertype
  • 的可疑类型(double而不是int或bigint)
  • heronames缺失索引(networknumber,heroname)
  • 锦标赛中缺少索引(tournamentcode,tournament_id),如果您获得了大量实际拥有非空代码的行

跟进您的修改:Bitmap Index Scan on "stats index"部分很好。真正的问题是错误的统计数据,导致嵌套的循环计划有大约120k行而不是预期的138.在每个表上运行analyze以查看它是否有所不同。在具有更高基数的stats表上,运行set statistics(请参阅:alter table)以在此之前稍微增加采样。

除此之外:你仍然有这个非常非常丑陋的问题:

Join Filter: (handsdata.pokertype = (public.poker_types.pokertypeid)::double precision)
    Rows Removed by Join Filter: 239510"

施法相当快......除非你必须做240k次。