懒惰的顺序/评估

时间:2017-02-08 21:40:32

标签: sql database postgresql database-design

修改

似乎可以将纯物化作为列存储在表中并编制索引;但是,我的具体用例(semver.satisfies)需要更通用的解决方案:

create table Submissions (
    version text
    created_at timestamp
)

create index Submissions_1 on Submissions (created_at)

我的查询将如下所示:

select * from Submissions
where
    created_at <= '2016-07-12' and
    satisfies(version, '>=1.2.3 <4.5.6')
order by created_at desc
limit 1;

我无法实际使用相同的记忆技术。

原始

我有一个表格,用于存储文本数据及其创建日期:

create table Submissions (
    content text,
    created_at timestamp
);

create index Submissions_1 on Submissions (created_at);

给定校验和和参考日期,我想得到Submission字段与校验和匹配的最新content

select * from Submissions
where
    created_at <= '2016-07-12' and
    expensive_chksm(content) = '77ac76dc0d4622ba9aa795acafc05f1e'
order by created_at desc
limit 1;

这很有效,但速度很慢。 Postgres最终做的是对每一行进行校验和,然后执行order by

 Limit  (cost=270834.18..270834.18 rows=1 width=32) (actual time=1132.898..1132.898 rows=1 loops=1)
   ->  Sort  (cost=270834.18..271561.27 rows=290836 width=32) (actual time=1132.898..1132.898 rows=1 loops=1)
         Sort Key: created_at DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Seq Scan on installation  (cost=0.00..269380.00 rows=290836 width=32) (actual time=0.118..1129.961 rows=17305 loops=1)
               Filter: created_at <= '2016-07-12' AND expensive_chksm(content) = '77ac76dc0d4622ba9aa795acafc05f1e'
               Rows Removed by Filter: 982695
 Planning time: 0.066 ms
 Execution time: 1246.941 ms

没有order by,它是一个亚毫秒级的操作,因为Postgres知道我只想要第一个结果。唯一的区别是我希望Postgres从最近的日期开始搜索。

理想情况下,Postgres会:

  
      
  1. created_at
  2. 过滤   
  3. created_at排序,降序
  4.   
  5. 返回校验和匹配的第一行
  6.   

我尝试使用内嵌视图编写查询,但explain analyze表示它将被重写为我上面已有的内容。

4 个答案:

答案 0 :(得分:2)

您可以同时为两个字段创建索引:

create index Submissions_1 on Submissions (created_at DESC, expensive_chksm(content));

                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.15..8.16 rows=1 width=40) (actual time=0.004..0.004 rows=0 loops=1)
   ->  Index Scan using submissions_1 on submissions  (cost=0.15..16.17 rows=2 width=40) (actual time=0.002..0.002 rows=0 loops=1)
         Index Cond: ((created_at <= '2016-07-12 00:00:00'::timestamp without time zone) AND ((content)::text = '77ac76dc0d4622ba9aa795acafc05f1e'::text))
 Planning time: 0.414 ms
 Execution time: 0.036 ms

在索引中使用DESC也很重要。

更新:

对于存储和比较版本,您可以使用int []

create table Submissions (
    version int[],
    created_at timestamp
);

INSERT INTO Submissions SELECT ARRAY [ (random() * 10)::int2, (random() * 10)::int2, (random() * 10)::int2], '2016-01-01'::timestamp + ('1 hour')::interval * random() * 10000 FROM generate_series(1, 1000000);

    create index Submissions_1 on Submissions (created_at DESC, version);

EXPLAIN ANALYZE select * from Submissions
where
    created_at <= '2016-07-12'
    AND version <= ARRAY [5,2,3]
    AND version > ARRAY [1,2,3]
order by created_at desc
limit 1;

                                                                             QUERY PLAN                                                                              
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.42..13.24 rows=1 width=40) (actual time=0.074..0.075 rows=1 loops=1)
   ->  Index Only Scan using submissions_1 on submissions  (cost=0.42..21355.76 rows=1667 width=40) (actual time=0.073..0.073 rows=1 loops=1)
         Index Cond: ((created_at <= '2016-07-12 00:00:00'::timestamp without time zone) AND (version <= '{5,2,3}'::integer[]) AND (version > '{1,2,3}'::integer[]))
         Heap Fetches: 1
 Planning time: 3.019 ms
 Execution time: 0.100 ms

到a_horse_with_no_name评论: where子句中条件的顺序与索引用法无关。最好先将可用于等式表达式的那个放在索引中,然后放入范围表达式。 -

BEGIN;

create table Submissions (
    content text,
    created_at timestamp
);


CREATE FUNCTION  expensive_chksm(varchar) RETURNS varchar AS $$
SELECT $1;
$$ LANGUAGE sql;

INSERT INTO Submissions SELECT (random() * 1000000000)::text, '2016-01-01'::timestamp + ('1 hour')::interval * random() * 10000 FROM generate_series(1, 1000000);
INSERT INTO Submissions SELECT '77ac76dc0d4622ba9aa795acafc05f1e', '2016-01-01'::timestamp + ('1 hour')::interval * random() * 10000 FROM generate_series(1, 100000);

    create index Submissions_1 on Submissions (created_at DESC, expensive_chksm(content));
--    create index Submissions_2 on Submissions (expensive_chksm(content), created_at DESC);

EXPLAIN ANALYZE select * from Submissions
where
    created_at <= '2016-07-12' and
    expensive_chksm(content) = '77ac76dc0d4622ba9aa795acafc05f1e'
order by created_at desc
limit 1;

使用Submission1:

                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..10.98 rows=1 width=40) (actual time=0.018..0.019 rows=1 loops=1)
   ->  Index Scan using submissions_1 on submissions  (cost=0.43..19341.43 rows=1833 width=40) (actual time=0.018..0.018 rows=1 loops=1)
         Index Cond: ((created_at <= '2016-07-12 00:00:00'::timestamp without time zone) AND ((content)::text = '77ac76dc0d4622ba9aa795acafc05f1e'::text))
 Planning time: 0.257 ms
 Execution time: 0.033 ms

使用Submission2:

                                                                             QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=4482.39..4482.40 rows=1 width=40) (actual time=29.096..29.096 rows=1 loops=1)
   ->  Sort  (cost=4482.39..4486.98 rows=1833 width=40) (actual time=29.095..29.095 rows=1 loops=1)
         Sort Key: created_at DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Bitmap Heap Scan on submissions  (cost=67.22..4473.23 rows=1833 width=40) (actual time=15.457..23.683 rows=46419 loops=1)
               Recheck Cond: (((content)::text = '77ac76dc0d4622ba9aa795acafc05f1e'::text) AND (created_at <= '2016-07-12 00:00:00'::timestamp without time zone))
               Heap Blocks: exact=936
               ->  Bitmap Index Scan on submissions_1  (cost=0.00..66.76 rows=1833 width=0) (actual time=15.284..15.284 rows=46419 loops=1)
                     Index Cond: (((content)::text = '77ac76dc0d4622ba9aa795acafc05f1e'::text) AND (created_at <= '2016-07-12 00:00:00'::timestamp without time zone))
 Planning time: 0.583 ms
 Execution time: 29.134 ms

PostgreSQL 9.6.1

答案 1 :(得分:1)

您可以将子查询用于时间戳和订购部分,然后在外面运行chksum:

bundles.Add(new StyleBundle("~/Content/my/path/sharedcss").Include(
    "~/Content/my/path/bootstrap.css",
    "~/Content/my/path/font-awesome.css",
    "~/Content/my/path/AjaxLoadAnimation.css"));

答案 2 :(得分:0)

如果您总是要查询checksum,那么另一种方法是在表中使用另一个名为校验和的列,例如:

create table Submissions (
    content text,
    created_at timestamp,
    checksum varchar
);

然后,只要行insert/update(或写inserted/updated)为您执行此操作,您就可以trigger校验和,并直接在checksum列查询以获得快速结果

答案 3 :(得分:0)

试试这个

select *
from Submissions
where created_at = (
  select max(created_at) 
  from Submissions 
  where expensive_chksm(content) = '77ac76dc0d4622ba9aa795acafc05f1e')