我是SQL的新手,需要帮助我有4个表:
helmet arm
+------+---------+-----+--------+ +------+---------+-----+--------+
| id | name | def | weight | | id | name | def | weight |
+------+---------+-----+--------+ +------+---------+-----+--------+
| 1 | head1 | 5 | 2.2 | | 1 | arm1 | 4 | 2.7 |
| 2 | head2 | 6 | 2.9 | | 2 | arm2 | 5 | 3.1 |
| 3 | head3 | 7 | 3.5 | | 3 | arm3 | 2 | 1.8 |
+------+---------+-----+--------+ +------+---------+-----+--------+
body leg
+------+---------+-----+--------+ +------+---------+-----+--------+
| id | name | def | weight | | id | name | def | weight |
+------+---------+-----+--------+ +------+---------+-----+--------+
| 1 | body1 | 10 | 5.5 | | 1 | leg1 | 8 | 3.5 |
| 2 | body2 | 5 | 2.4 | | 2 | leg2 | 5 | 2.0 |
| 3 | body3 | 17 | 6.9 | | 3 | leg3 | 8 | 1.8 |
+------+---------+-----+--------+ +------+---------+-----+--------+`
我正在寻找总重量< =输入
的最高总保额
像这样:总重量< = 10
查询:
select
helmet.name as hname, body.name as bname,
arm.name as aname, leg.name as lname,
helmet.poise + body.poise + arm.poise + leg.poise as totalpoise,
helmet.weight + body.weight + arm.weight + leg.weight as totalweight
from
helmet
inner join
body on 1=1
inner join
arm on 1=1
inner join
leg on 1=1
where
helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by
totalpoise desc
limit 5
结果:
+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 | arm1 | leg3 | 23 | 9.8 |
| head1 | body2 | arm2 | leg3 | 23 | 9.5 |
| head3 | body2 | arm3 | leg3 | 22 | 9.5 |
| head1 | body2 | arm1 | leg3 | 22 | 9.1 |
| head2 | body2 | arm3 | leg3 | 21 | 8.9 |
+-------+-------+-------+-------+----------+-------------+
问题是每个表有大约100行,所以可能的结果是100m +行。查询需要很长时间。我不确定这是关于我的硬件或数据库或查询的类型。
P.S:我使用硬盘驱动器并拥有8GB内存。我曾在MySQL和PostgreSQL上测试过。更新 我还没有创建索引。
这是解释计划吗? explain plan
需要多长时间? 这取决于输入。 在MySQL上大约几分钟 - 几个小时 在PostgreSQL上大约需要30秒--2分钟。
更新2我的表永远不会改变。那么我可以将所有结果存储在一个表中吗?这有帮助吗?
更新3我考虑分区。它可能要快得多,但问题是如果下部分区中的某些[装甲设置]在上部分区中的totaldef超过[armor set]。 例如:
[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]
所以分区总重量&gt; 10会错过[盔甲套装]因为它在其他分区。
这是任何想要测试的人的CSV文件。 CSV file
更新4我认为最快的方法是创建materialized view,但我想性能的关键是排序。我不知道哪种类型可以帮助物化视图或索引,但我对它们进行了排序,这很有帮助。
我没想到会得到很多像这样的帮助。谢谢。
答案 0 :(得分:2)
非常有趣的问题。我不知道你的情况有什么特别的方法。如果我是你,我将测试以下内容:身体似乎比头盔,手臂和腿更重。因此,我将首先在该表上查询,然后在每个联接上查询,并确保权重的总和不超过您的输入。如下:
SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise,
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight
FROM body
INNER JOIN helmet
ON 1=1
AND body.weight + helmet.weight <= 10
INNER JOIN arm
ON 1=1
AND body.weight + helmet.weight + arm.weight <= 10
INNER JOIN leg
ON 1=1
AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5
同样正如@ juergen-d在评论中提到的那样,索引会对性能产生影响。您可以在每个权重列上使用或不使用索引来区分差异。
对于PostgreSQL:
CREATE INDEX index_body_on_weight ON body(weight);
在与zerkms和Laurenz Albe进行一些讨论后,他们同意说这三个索引是无用的,不应该使用 :(如果我有时间,我会做基准测试)
CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);
PostgreQSL 9.3.5的基准:
slowbs's Query : 107.628 second
my proposition Query : 12.066 second
my proposition Query : 16.257 second (with only index_body_on_weight)
my proposition Query : 13.217 second (with 4 indexes)
基准测试结论:这种情况下的索引效率低下。 @zerkms和@Laurenz Albe是对的。
最后但并非最不重要,请分享您的结果。
答案 1 :(得分:2)
具有适当索引的materialized view表现相当不错,在我老化的SSD桌面上使用Postgresql配置版本的1.8秒:
create materialized view v as
select
h.name as hname, b.name as bname, a.name as aname, l.name as lname,
total_poise, total_weight
from
helmet h
cross join
body b
cross join
arm a
cross join
leg l
cross join lateral (
select
h.weight + b.weight + l.weight + a.weight as total_weight,
h.poise + b.poise + l.poise + a.poise as total_poise
) total
order by total_poise desc, total_weight
;
create index v_index on v (total_poise desc, total_weight);
执行和分析:
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
hname | bname | aname | lname | total_poise | total_weight
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
Fume Sorcerer Mask+10 | Moon Butterfly Wings+5 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 | 20 | 9.4
Fume Sorcerer Mask+10 | Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 | 20 | 9.5
Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 | 20 | 9.5
Fume Sorcerer Mask+10 | Moon Butterfly Wings+5 | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10 | 20 | 9.6
Fume Sorcerer Mask+10 | Moon Butterfly Wings+5 | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10 | 20 | 9.6
explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
-> Index Scan using v_index on v (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
Index Cond: (total_weight <= '10'::double precision)
Planning time: 0.126 ms
Execution time: 1847.722 ms
答案 2 :(得分:2)
因为您的表永远不会更改,所以您可以缓存中间数据。对于PostgreSQL,它可能是materialized view
:
create materialized view equipments as
select
h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
(h.def+a.def+b.def+l.def) as total_def,
(h.weight+a.weight+b.weight+l.weight) as total_weight
from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);
这是一次性繁重的操作,但之后的查询如下:
select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;
会快得多。当然,您可以将表格加入到上面的查询中,以获取有关设备的详细信息。
如果表格被更改,您可以致电REFRESH MATERIALIZED VIEW
。
我不熟悉MySQL,但你可以google mysql materialized view
或只是使用常规表。
又一次尝试:partitioning。
(drop materialized view equipments
,如果它是在之前的尝试中创建的)
create table equipments(
helmet_id int, arm_id int, body_id int, leg_id int,
total_weight float, total_def float);
有基本表。接下来我们将创建分区。例如,如果最大总重量为40,则总共有0-10,10-20,20-30和30-40个分区:
create table equipments_10 (check (total_weight>0 and total_weight<=10))
inherits (equipment);
create table equipments_20 (check (total_weight>10 and total_weight<=20))
inherits (equipment);
create table equipments_30 (check (total_weight>20 and total_weight<=30))
inherits (equipment);
create table equipments_40 (check (total_weight>30))
inherits (equipment);
填写我们的表格:
insert into equipments
select
h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
(h.def+a.def+b.def+l.def) as total_def,
(h.weight+a.weight+b.weight+l.weight) as total_weight
from helmet as h, arm as a, body as b, leg as l;
创建大量索引以使PostgreSQL有机会选择最有效的执行计划:
create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight);
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight);
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight);
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight);
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);
最后计算有关数据的统计信息:
analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;
查询与之前的尝试类似。
PS:Here is my test如果有人想尝试的话 PPS:在我的测试中,每个查询,独立于参数小于0.5毫秒(在我的史前硬件上)。
答案 3 :(得分:1)
只是为了好玩而且完整性:统一表上的递归解决方案。这可能不是最快的,但如果表变大并且可以使用索引,它可能会赢。 (像3*3*3*3
这样的平凡例子通常会产生散列连接计划,甚至是嵌套的表扫描)
-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
( 1, 'head1', 5, 2.2) ,( 2, 'head2', 6, 2.9) ,( 3, 'head3', 7, 3.5) ;
CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
( 1, 'body1', 10, 5.5) ,( 2, 'body2', 5 , 2.4) ,( 3, 'body3', 17, 6.9) ;
CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
( 1, 'arm1', 4, 2.7) ,( 2, 'arm2', 5, 3.1) ,( 3, 'arm3', 2, 1.8) ;
CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
( 1, 'leg1', 8, 3.5) ,( 2, 'leg2', 5, 2.0) ,( 3, 'leg3', 8, 1.8) ;
-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
;
-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;
-- SELECT * FROM allgear ORDER by gid, id;
-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
SELECT gid AS gid
, ARRAY[ name] AS arr
, poise AS totpoise
, weight AS totweight
FROM allgear
WHERE gid = 1
UNION ALL
SELECT ag.gid
, rrr.arr || ARRAY[ag.name] AS arr
, rrr.totpoise +ag.poise AS totpoise
, (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
FROM allgear ag
JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
)
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
;
结果:
gid | arr | totpoise | totweight
-----+-------------------------+----------+-----------
4 | {head2,body2,arm1,leg2} | 20 | 10.00
4 | {head1,body2,arm3,leg1} | 20 | 9.90
4 | {head2,body2,arm1,leg3} | 23 | 9.80
4 | {head3,body2,arm3,leg2} | 19 | 9.70
4 | {head1,body2,arm2,leg2} | 20 | 9.70
(5 rows)
注意:我得到了更多组合,可能是因为我使用了DECIMAL(4,2)
而不是浮点类型。
额外:如果我们知道剩余水平(齿轮类型)将添加的最小重量是什么,我们可以添加一些额外的修剪(即使在较低的水平)。我为此添加了一个额外的表格。
CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
SELECT gid AS gid
, ARRAY[ name] AS arr
, poise AS totpoise
, weight AS totweight
FROM allgear
WHERE gid = 1
UNION ALL
SELECT ag.gid
, rrr.arr || ARRAY[ag.name] AS arr
, rrr.totpoise +ag.poise AS totpoise
, (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
FROM allgear ag
JOIN rrr ON ag.gid = rrr.gid+1
-- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
JOIN LATERAL ( SELECT SUM(mimi) AS debt
FROM minima
WHERE gid > ag.gid
) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
)
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
;