是否有可能更快地获得查询?

时间:2017-02-18 08:06:49

标签: mysql sql database algorithm postgresql

我是SQL的新手,需要帮助我有4个表:

helmet                                  arm
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  head1  |  5  |   2.2  |       |   1  |   arm1  |  4  |   2.7  |
|   2  |  head2  |  6  |   2.9  |       |   2  |   arm2  |  5  |   3.1  |
|   3  |  head3  |  7  |   3.5  |       |   3  |   arm3  |  2  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+

body                                    leg
+------+---------+-----+--------+       +------+---------+-----+--------+
|  id  |   name  | def | weight |       |  id  |   name  | def | weight |
+------+---------+-----+--------+       +------+---------+-----+--------+
|   1  |  body1  |  10  |  5.5  |       |   1  |   leg1  |  8  |   3.5  |
|   2  |  body2  |  5   |  2.4  |       |   2  |   leg2  |  5  |   2.0  |
|   3  |  body3  |  17  |  6.9  |       |   3  |   leg3  |  8  |   1.8  |
+------+---------+-----+--------+       +------+---------+-----+--------+`

我正在寻找总重量< =输入
的最高总保额 像这样:总重量< = 10

查询:

select 
    helmet.name as hname, body.name as bname, 
    arm.name as aname, leg.name as lname,
    helmet.poise + body.poise + arm.poise + leg.poise as totalpoise, 
    helmet.weight + body.weight + arm.weight + leg.weight as totalweight 
from 
    helmet 
inner join 
    body on 1=1
inner join 
    arm on 1=1
inner join 
    leg on 1=1 
where 
    helmet.weight + body.weight + arm.weight + leg.weight <= 10
order by 
    totalpoise desc 
limit 5

结果:

+-------+-------+-------+-------+----------+-------------+
| hname | bname | aname | lname | totaldef | totalweight |
+-------+-------+------ +-------+----------+-------------+
| head2 | body2 |  arm1 |  leg3 |    23    |     9.8     |
| head1 | body2 |  arm2 |  leg3 |    23    |     9.5     |
| head3 | body2 |  arm3 |  leg3 |    22    |     9.5     |
| head1 | body2 |  arm1 |  leg3 |    22    |     9.1     |
| head2 | body2 |  arm3 |  leg3 |    21    |     8.9     |
+-------+-------+-------+-------+----------+-------------+

问题是每个表有大约100行,所以可能的结果是100m +行。查询需要很长时间。我不确定这是关于我的硬件或数据库或查询的类型。

P.S:我使用硬盘驱动器并拥有8GB内存。我曾在MySQL和PostgreSQL上测试过。

更新 我还没有创建索引。

这是解释计划吗? explain plan

需要多长时间? 这取决于输入。 在MySQL上大约几分钟 - 几个小时 在PostgreSQL上大约需要30秒--2分钟。

更新2我的表永远不会改变。那么我可以将所有结果存储在一个表中吗?这有帮助吗?

更新3我考虑分区。它可能要快得多,但问题是如果下部分区中的某些[装甲设置]在上部分区中的totaldef超过[armor set]。 例如:

[head1,arm1,body1,leg1][totaldef 25][totalweight 9.9]
[head2,arm2,body2,leg2][totaldef 20][totalweight 11.0]

所以分区总重量&gt; 10会错过[盔甲套装]因为它在其他分区。

这是任何想要测试的人的CSV文件。 CSV file

更新4我认为最快的方法是创建materialized view,但我想性能的关键是排序。我不知道哪种类型可以帮助物化视图或索引,但我对它们进行了排序,这很有帮助。

我没想到会得到很多像这样的帮助。谢谢。

4 个答案:

答案 0 :(得分:2)

非常有趣的问题。我不知道你的情况有什么特别的方法。如果我是你,我将测试以下内容:身体似乎比头盔,手臂和腿更重。因此,我将首先在该表上查询,然后在每个联接上查询,并确保权重的总和不超过您的输入。如下:

SELECT helmet.name AS hname, body.name AS bname, arm.name AS aname, leg.name AS lname,
helmet.poise + body.poise + arm.poise + leg.poise AS totalpoise, 
helmet.weight + body.weight + arm.weight + leg.weight AS totalweight 
FROM body 
    INNER JOIN helmet 
    ON 1=1 
        AND body.weight + helmet.weight <= 10
    INNER JOIN arm 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight <= 10
    INNER JOIN leg 
    ON 1=1 
        AND body.weight + helmet.weight + arm.weight + leg.weight <= 10
WHERE body.weight <= 10
ORDER BY totalpoise DESC limit 5

同样正如@ juergen-d在评论中提到的那样,索引会对性能产生影响。您可以在每个权重列上使用或不使用索引来区分差异。

对于PostgreSQL:

CREATE INDEX index_body_on_weight ON body(weight);

在与zerkms和Laurenz Albe进行一些讨论后,他们同意说这三个索引是无用的,不应该使用 :(如果我有时间,我会做基准测试)

CREATE INDEX index_helmet_on_weight ON helmet(weight);
CREATE INDEX index_arm_on_weight ON arm(weight);
CREATE INDEX index_leg_on_weight ON leg(weight);

PostgreQSL 9.3.5的基准:

 slowbs's Query : 107.628 second
 my proposition Query : 12.066 second
 my proposition Query : 16.257 second (with only index_body_on_weight)
 my proposition Query : 13.217 second (with 4 indexes)

基准测试结论:这种情况下的索引效率低下。 @zerkms和@Laurenz Albe是对的。

最后但并非最不重要,请分享您的结果。

答案 1 :(得分:2)

具有适当索引的materialized view表现相当不错,在我老化的SSD桌面上使用Postgresql配置版本的1.8秒:

create materialized view v as
select
    h.name as hname, b.name as bname, a.name as aname, l.name as lname,
    total_poise, total_weight
from
    helmet h
    cross join
    body b
    cross join
    arm a
    cross join
    leg l
    cross join lateral (
        select
            h.weight + b.weight + l.weight + a.weight as total_weight,
            h.poise + b.poise + l.poise + a.poise as total_poise
    ) total
order by total_poise desc, total_weight
;

create index v_index on v (total_poise desc, total_weight);

执行和分析:

select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
         hname         |          bname           |         aname          |          lname           | total_poise | total_weight 
-----------------------+--------------------------+------------------------+--------------------------+-------------+--------------
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.4
 Fume Sorcerer Mask+10 | Lion Warrior Cape+10     | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Red Lion Warrior Cape+10 | Velstadt`s Gauntlets+5 | Prisoner`s Waistcloth+10 |          20 |          9.5
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Lion Warrior Skirt+10    |          20 |          9.6
 Fume Sorcerer Mask+10 | Moon Butterfly Wings+5   | Velstadt`s Gauntlets+5 | Moon Butterfly Skirt+10  |          20 |          9.6


explain analyze
select *
from v
where total_weight <= 10
order by total_poise desc, total_weight
limit 5
;
                                                             QUERY PLAN                                                              
-------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.57..11.71 rows=5 width=88) (actual time=1847.680..1847.694 rows=5 loops=1)
   ->  Index Scan using v_index on v  (cost=0.57..11191615.70 rows=5020071 width=88) (actual time=1847.678..1847.691 rows=5 loops=1)
         Index Cond: (total_weight <= '10'::double precision)
 Planning time: 0.126 ms
 Execution time: 1847.722 ms

答案 2 :(得分:2)

因为您的表永远不会更改,所以您可以缓存中间数据。对于PostgreSQL,它可能是materialized view

create materialized view equipments as
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;
create index i_def on equipments(total_def);
create index i_weight on equipments(total_weight);

这是一次性繁重的操作,但之后的查询如下:

select *
from equipments
where total_weight <= 10
order by total_def desc
limit 5;

会快得多。当然,您可以将表格加入到上面的查询中,以获取有关设备的详细信息。

如果表格被更改,您可以致电REFRESH MATERIALIZED VIEW

我不熟悉MySQL,但你可以google mysql materialized view或只是使用常规表。

又一次尝试:partitioning

drop materialized view equipments,如果它是在之前的尝试中创建的)

create table equipments(
  helmet_id int, arm_id int, body_id int, leg_id int,
  total_weight float, total_def float);

有基本表。接下来我们将创建分区。例如,如果最大总重量为40,则总共有0-10,10-20,20-30和30-40个分区:

create table equipments_10 (check (total_weight>0 and total_weight<=10))
  inherits (equipment); 
create table equipments_20 (check (total_weight>10 and total_weight<=20))
  inherits (equipment); 
create table equipments_30 (check (total_weight>20 and total_weight<=30))
  inherits (equipment); 
create table equipments_40 (check (total_weight>30))
  inherits (equipment);

填写我们的表格:

insert into equipments
  select
    h.id as helmet_id, a.id as arm_id, b.id as body_id, l.id as leg_id,
    (h.def+a.def+b.def+l.def) as total_def,
    (h.weight+a.weight+b.weight+l.weight) as total_weight
  from helmet as h, arm as a, body as b, leg as l;

创建大量索引以使PostgreSQL有机会选择最有效的执行计划:

create index i_equip_total_def on equipments(total_def);
create index i_equip_total_weight on equipments(total_weight); 
create index i_equip_10_total_def on equipments_10(total_def);
create index i_equip_10_total_weight on equipments_10(total_weight); 
create index i_equip_20_total_def on equipments_20(total_def);
create index i_equip_20_total_weight on equipments_20(total_weight); 
create index i_equip_30_total_def on equipments_30(total_def);
create index i_equip_30_total_weight on equipments_30(total_weight); 
create index i_equip_40_total_def on equipments_40(total_def);
create index i_equip_40_total_weight on equipments_40(total_weight);

最后计算有关数据的统计信息:

analyze equipments;
analyze equipments_10;
analyze equipments_20;
analyze equipments_30;
analyze equipments_40;

查询与之前的尝试类似。

PS:Here is my test如果有人想尝试的话 PPS:在我的测试中,每个查询,独立于参数小于0.5毫秒(在我的史前硬件上)。

答案 3 :(得分:1)

只是为了好玩而且完整性:统一表上的递归解决方案。这可能不是最快的,但如果表变大并且可以使用索引,它可能会赢。 (像3*3*3*3这样的平凡例子通常会产生散列连接计划,甚至是嵌套的表扫描)

-- the data
CREATE TABLE helmet(id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO helmet(id, name, poise, weight) VALUES
(   1, 'head1', 5, 2.2) ,(   2, 'head2', 6, 2.9) ,(   3, 'head3', 7, 3.5) ;

CREATE TABLE body (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO body(id, name, poise, weight) VALUES
 (   1, 'body1', 10, 5.5) ,(   2, 'body2', 5 , 2.4) ,(   3, 'body3', 17, 6.9) ;

CREATE TABLE arm (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO arm(id, name, poise, weight) VALUES
 (   1, 'arm1', 4, 2.7) ,(   2, 'arm2', 5, 3.1) ,(   3, 'arm3', 2, 1.8) ;

CREATE TABLE leg (id INTEGER NOT NULL PRIMARY KEY, name text, poise INTEGER NOT NULL DEFAULT 0, weight DECIMAL (4,2) );
INSERT INTO leg(id, name, poise, weight) VALUES
 (   1, 'leg1', 8, 3.5) ,(   2, 'leg2', 5, 2.0) ,(   3, 'leg3', 8, 1.8) ;


-- combine the four tables into one
CREATE table allgear AS
SELECT 1 AS gid, 'helmet' AS gear, h.id, h.name, h.poise, h.weight FROM helmet h
UNION ALL
SELECT 2 AS gid, 'body' AS gear, b.id, b.name, b.poise, b.weight FROM body b
UNION ALL
SELECT 3 AS gid, 'arm' AS gear, a.id, a.name, a.poise, a.weight FROM arm a
UNION ALL
SELECT 4 AS gid, 'leg' AS gear, l.id, l.name, l.poise, l.weight FROM leg l
        ;

-- add som structure ...
ALTER TABLE allgear ADD PRIMARY KEY(gid, id);
CREATE INDEX ON allgear(gid, weight);
VACUUM ANALYZE allgear;

-- SELECT * FROM allgear ORDER by gid, id;


-- Recursive query with some pruning on the partial results.
-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid +1 AND (rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)
        )
SELECT * FROM rrr
WHERE gid = 4 -- the gid of the final one
ORDER BY totweight DESC
LIMIT 5
        ;

结果:

 gid |           arr           | totpoise | totweight 
-----+-------------------------+----------+-----------
   4 | {head2,body2,arm1,leg2} |       20 |     10.00
   4 | {head1,body2,arm3,leg1} |       20 |      9.90
   4 | {head2,body2,arm1,leg3} |       23 |      9.80
   4 | {head3,body2,arm3,leg2} |       19 |      9.70
   4 | {head1,body2,arm2,leg2} |       20 |      9.70
(5 rows)

注意:我得到了更多组合,可能是因为我使用了DECIMAL(4,2)而不是浮点类型。

额外:如果我们知道剩余水平(齿轮类型)将添加的最小重量是什么,我们可以添加一些额外的修剪(即使在较低的水平)。我为此添加了一个额外的表格。

CREATE TABLE minima AS
SELECT gid, MIN(weight) AS mimi
FROM allgear
GROUP BY gid;
-- add an extra level ...
INSERT INTO minima(gid, mimi) VALUES (5, 0.0);

-- EXPLAIN ANALYZE
WITH recursive rrr AS (
        SELECT gid AS gid
                , ARRAY[ name] AS arr
                , poise AS totpoise
                , weight AS totweight
        FROM allgear
        WHERE gid = 1
        UNION ALL
        SELECT ag.gid
                , rrr.arr || ARRAY[ag.name] AS arr
                , rrr.totpoise +ag.poise AS totpoise
                , (rrr.totweight +ag.weight)::decimal(4,2) AS totweight
        FROM allgear ag
        JOIN rrr ON ag.gid = rrr.gid+1
        -- Do some extra pruning: Partial sum + the missing parts should not sum up to more than 10
        JOIN LATERAL ( SELECT SUM(mimi) AS debt
                FROM minima
                WHERE gid > ag.gid
                ) susu ON (susu.debt +rrr.totweight + ag.weight)::DECIMAL(4,2) <= 10.0::DECIMAL(4,2)

        )
SELECT * FROM rrr
WHERE gid = 4
ORDER BY totweight DESC
LIMIT 5
        ;