将时间戳与时期秒进行比较

时间:2017-10-30 14:35:19

标签: postgresql timestamp plpgsql epoch postgresql-9.5

在PostgreSQL 9.5中,我有一个包含67000条记录的表:

# \d words_nouns
           Table "public.words_nouns"
 Column  |           Type           | Modifiers 
---------+--------------------------+-----------
 word    | text                     | not null
 hashed  | text                     | not null
 added   | timestamp with time zone | 
 removed | timestamp with time zone | 
Indexes:
    "words_nouns_pkey" PRIMARY KEY, btree (word)
Check constraints:
    "words_nouns_word_check" CHECK (word ~ '^[A-Z]{2,}$'::text)

类似的表words_verbs,有36000条记录。

定义以下自定义函数是个好主意:

CREATE OR REPLACE FUNCTION words_get_added(
                in_visited integer,
                OUT out_json jsonb
        ) RETURNS jsonb AS
$func$
DECLARE
        _added text[];
BEGIN
        -- create array with words added to dictionary since in_visited timestamp
        IF in_visited > 0 THEN
                _added := (
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_nouns 
                        WHERE EXTRACT(EPOCH FROM added) > in_visited
                        UNION
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_verbs 
                        WHERE EXTRACT(EPOCH FROM added) > in_visited
                );

                IF  CARDINALITY(_added) > 0 THEN
                        out_json := jsonb_build_object('added', _added);
                END IF;
        END IF;
END
$func$ LANGUAGE plpgsql;

或者我应该更好地将in_visited转换为带有时区的时间戳并与之比较:

CREATE OR REPLACE FUNCTION words_get_added(
                in_visited integer,
                OUT out_json jsonb
        ) RETURNS jsonb AS
$func$
DECLARE
        _added text[];
BEGIN
        -- create array with words added to dictionary since in_visited timestamp
        IF in_visited > 0 THEN
                _added := (
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_nouns 
                        WHERE added > TO_TIMESTAMP(in_visited)
                        UNION
                        SELECT ARRAY_AGG(hashed) 
                        FROM words_verbs 
                        WHERE added > TO_TIMESTAMP(in_visited)
                );

                IF CARDINALITY(_added) > 0 THEN
                        out_json := jsonb_build_object('added', _added);
                END IF;
        END IF;
END
$func$ LANGUAGE plpgsql;

以下是2个EXPLAIN输出,但我不确定如何解释它们:

# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE EXTRACT(EPOCH FROM added) > 0
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE EXTRACT(EPOCH FROM added) > 0;
                                         QUERY PLAN                                          
---------------------------------------------------------------------------------------------
 Unique  (cost=2707.03..2707.04 rows=2 width=32)
   ->  Sort  (cost=2707.03..2707.03 rows=2 width=32)
         Sort Key: (array_agg(words_nouns.hashed))
         ->  Append  (cost=1740.53..2707.02 rows=2 width=32)
               ->  Aggregate  (cost=1740.53..1740.54 rows=1 width=32)
                     ->  Seq Scan on words_nouns  (cost=0.00..1684.66 rows=22348 width=32)
                           Filter: (date_part('epoch'::text, added) > '0'::double precision)
               ->  Aggregate  (cost=966.45..966.46 rows=1 width=32)
                     ->  Seq Scan on words_verbs  (cost=0.00..936.05 rows=12157 width=32)
                           Filter: (date_part('epoch'::text, added) > '0'::double precision)
(10 rows)

# EXPLAIN SELECT ARRAY_AGG(hashed)
FROM words_nouns
WHERE added > to_timestamp(0)
UNION
SELECT ARRAY_AGG(hashed)
FROM words_verbs
WHERE added > to_timestamp(0);
                                           QUERY PLAN                                           
------------------------------------------------------------------------------------------------
 Unique  (cost=2361.99..2362.00 rows=2 width=32)
   ->  Sort  (cost=2361.99..2361.99 rows=2 width=32)
         Sort Key: (array_agg(words_nouns.hashed))
         ->  Append  (cost=1517.06..2361.98 rows=2 width=32)
               ->  Aggregate  (cost=1517.06..1517.07 rows=1 width=32)
                     ->  Seq Scan on words_nouns  (cost=0.00..1517.05 rows=1 width=32)
                           Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
               ->  Aggregate  (cost=844.88..844.89 rows=1 width=32)
                     ->  Seq Scan on words_verbs  (cost=0.00..844.88 rows=1 width=32)
                           Filter: (added > '1970-01-01 01:00:00+01'::timestamp with time zone)
(10 rows)

问题是:2个存储函数中哪一个具有更好的性能,或者它们没有区别?

1 个答案:

答案 0 :(得分:2)

性能的关键因素是与您的查询匹配的索引。通常,您在普通列added上有一个索引,它应该与索引适用的相同类型的输入参数匹配。

对于手头的任务,将added上的普通索引与第二函数(added > TO_TIMESTAMP(in_visited))结合使用 - 或者对主题进行修改。函数转换发生在将值与列added进行比较之前,因此表达式是“sargable”。

对于最高读取性能,您可能在(added, hashed)上有一个多列索引,并保持表的真空以允许仅索引扫描......

相关: