我在2个数据库中测试了以下查询,结构完全相同,在第一个数据库中有4M条目,它在33秒内返回结果。第二个表有29M行,自从我执行查询以来已经有16个小时了,我还没有回复。
SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count
FROM holdem_hand_player_statistics
JOIN (
SELECT id_player AS pid2, id_hand AS hid, sbvpip
FROM holdem_hand_player_statistics
JOIN (
SELECT id_player AS pid, ROUND(avg(flg_vpip::int)*25) AS sbvpip
FROM holdem_hand_player_statistics
WHERE position = 8 AND cnt_players = 6
GROUP BY id_player
) AS auxtable
ON pid = id_player
WHERE position = 8 AND cnt_players = 6
) AS auxtable2
ON hid = id_hand
JOIN (
SELECT id_player AS pid4, id_hand AS hid2, btnvpip
FROM holdem_hand_player_statistics
JOIN (
SELECT id_player AS pid3, ROUND(avg(flg_vpip::int)*25) AS btnvpip
FROM holdem_hand_player_statistics
WHERE position = 0 AND cnt_players = 6
GROUP BY id_player
) AS auxtable3
ON pid3 = id_player
WHERE position = 0 AND cnt_players = 6
) AS auxtable4
ON hid2 = id_hand
WHERE POSITION = 0 and cnt_players = 6
GROUP BY sbvpip,btnvpip
ORDER BY 1,2;
如何才能让此查询执行得更快?
该表可能已损坏或类似吗?一个表只比另一个表大7~8倍,但处理时间要多15000倍,这是正常的吗?
欢迎任何其他评论!
如果我的英语不清楚,请告诉我,我会尝试以不同的方式表达自己。
非常感谢您的帮助,
附加信息:
根据我使用的变量,其中3个是索引:id_hand,id_player,position。主键是(id_hand,id_player)。该表共有129列和6个索引。
我也在两个表中运行了EXPLAIN,但得到了不同的结果。这俩 结果在gdocs电子表格中: https://spreadsheets.google.com/ccc?key=tGxqxVNzHYznb1VVjtKyAuw&authkey=CJ-BiYkN&authkey=CJ-BiYkN#gid=0
答案 0 :(得分:3)
我建议在其中一台服务器上建立索引是不存在的或不正确的。
还可能阻止查询完成。特别是如果有一个未提交的交易坐在那里。
答案 1 :(得分:2)
可能你会为更多的行使用更多的排序内存:你的work_mem
设置是什么?与buffercache类似,因为您多次扫描同一个表,所以将行装入缓存可能是至关重要的。
此外,您应该重新检查该查询,并尝试找到无需多次将统计信息表重新连接到自身的方法。如果没有至少一些小的测试数据和预期的输出,很难建议。您使用的是哪个版本的PostgreSQL?使用8.4,您可能至少可以从单个CTE获得auxtable和auxtable3 ......
答案 2 :(得分:1)
查询看起来很好。提高性能尝试像@HLGEM那样做索引。 还尝试执行每个单独的子查询,以查看哪个子查询性能较低。
答案 3 :(得分:1)
我很容易相信这些查询需要更长的时间。您有一个29M行表,您正在执行多个组并在不同列上多次链接回自身。如果整个表不适合内存,可能会涉及很多涉及行的1/7不需要的分页。向内工作,你是:
你可以将表分成不同的表吗?你的字段究竟是什么意思,样本手的样子是什么?
至少需要id_player,id_hand,position和cnt_players的索引。
在索引中包含所有字段可能会很好。我不确定postgresql,但如果查询所需的所有数据都在索引中,SQL Server可以跳过加载实际的表数据页面。所以,如果你有一个位置索引,cnt_players,id_player和flg_vpip,你最内层的选择可能要快得多。
如果您不打算经常运行查询,我认为更好的方法是提前计算这些内部选择到一个或两个表。
select id_player, position, cnt_players,
ROUND(avg(flg_vpip::int)*25) AS avg_vpip
into auxtable
from holdem oldem
group by id_player, position, cnt_players
alter table auxtable add constraint PK_auxtable
primary key clustered (id_player, position, cnt_players)
像这样:
SELECT sbvpip*4 as smallbvpip,btnvpip*4 as buttonvpip, sum(amt_won)*400/count(*) AS winrate, count(*) as count
FROM holdem
JOIN (
SELECT id_player AS pid2, id_hand AS hid, sbvpip
FROM holdem
JOIN auxtable ON auxtable.id_payer = holdem.id_player
and auxtable.position = holdem.position
and auxtable.cnt_players = holdem.cnt_players
WHERE holdem.position = 8 AND holdem.cnt_players = 6
) AS auxtable2 ON hid = id_hand