如何在插入之前识别要解压缩的块?

时间:2020-08-05 04:48:23

标签: compression timescaledb

Timescaledb文档显示了如何解压缩特定块:

SELECT decompress_chunk('chunk_name');

或给定超表的所有块:

SELECT decompress_chunk(show_chunks('hypertable_name'));

但是,这意味着您要么需要知道要插入哪个块,要么可以对整个表进行解压缩。我正在使用一张大桌子(> 100 GB未压缩)。在这种情况下,解压缩整个表是不切实际的,尤其是它具有附加的维度(在时间戳旁边与分块一起使用)。

是否有可能在给定日期时间和维度范围的情况下找到与我的查询相关的块?

1 个答案:

答案 0 :(得分:2)

更新:答案已在TimescaleDB 1.7上进行了测试。

可以在show_chunk,模式timescaledb_information中的公共信息视图以及{{中的内部目录表]的帮助下找到与时间和空间维度值匹配的特定块。 1}}。

首先,_timescaledb_catalog具有可选参数show_chunkolder_than,这允许查找具有给定时间戳记的,比该块旧或新的块,然后从所有块中减去。例如:

newer_than

仅检索SELECT c.chunk_name FROM (SELECT show_chunks('hyper') AS chunk_name EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz)) EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c 上的压缩块compression_status = 'Compressed'会有所帮助。

如果在超表上也定义了空间维度,则上面的查询将返回与该空间维度上的分区数相同的块数。为了找到正确的尺寸,必须检查空间尺寸值属于哪个块并将空间尺寸范围存储在timescaledb_information.compressed_chunk_stats中。 最后一个查询的示例在结尾。

让我们举个例子:

_timescaledb_catalog.dimension_slice

最后一个查询压缩了所有块并给出了结果:

CREATE TABLE hyper(
    time timestamptz NOT NULL, 
    device int, 
    value float
);
SELECT * FROM create_hypertable('hyper', 'time', 'device', 2);

ALTER TABLE hyper SET (timescaledb.compress, 
                       timescaledb.compress_segmentby='device', 
                       timescaledb.compress_orderby = 'time DESC');

INSERT INTO hyper VALUES
       ('2017-01-01 06:01', 1, 1.2),
       ('2017-01-01 09:11', 3, 4.3),
       ('2017-01-01 08:01', 1, 7.3),
       ('2017-01-02 08:01', 2, 0.23),
       ('2018-07-02 08:01', 87, 0.0),
       ('2018-07-01 06:01', 13, 3.1),
       ('2018-07-01 09:11', 90, 10303.12),
       ('2018-07-01 08:01', 29, 64),
       ('2019-07-02 08:01', 87, 0.0),
       ('2019-07-01 06:01', 13, 3.1),
       ('2019-07-01 09:11', 90, 10303.12),
       ('2019-07-01 08:01', 29, 64);

SELECT compress_chunk(show_chunks('hyper'));

我们的目的是插入以下值:

            compress_chunk
-----------------------------------------
 _timescaledb_internal._hyper_3_13_chunk
 _timescaledb_internal._hyper_3_14_chunk
 _timescaledb_internal._hyper_3_15_chunk
 _timescaledb_internal._hyper_3_16_chunk
 _timescaledb_internal._hyper_3_17_chunk
 _timescaledb_internal._hyper_3_18_chunk
(6 rows)

失败,并显示以下信息:

INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1);

以下查询允许查找满足时间值的块:

ERROR:  insert/update/delete not permitted on chunk "_hyper_3_16_chunk"
HINT:  Make sure the chunk is not compressed.

有2个块的结果,因为一个空间维度有2个分区:

SELECT c.chunk_name
FROM (SELECT show_chunks('hyper') AS chunk_name
    EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
    EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name;

更新以获取更多详细信息 可以通过检查存储在 chunk_name ----------------------------------------- _timescaledb_internal._hyper_3_15_chunk _timescaledb_internal._hyper_3_16_chunk (2 rows) 中的范围值来进一步选择给定device值的块。通过将_timescaledb_catalog.dimension_slice上的_timescaledb_catalog.chunkchunk_name上的_timescaledb_catalog.chunk_constraint和最后chunk_id上的_timescaledb_catalog.dimension_slice进行匹配,将选定的块匹配到他们的维度切片。维度切片的选择是在使用哈希值的范围内完成的。此条件与块表的约束相同。例如,使用dimension_slice_id

d _chunk_name

以下查询演示了如何在上述查询结果上使用内部目录来获取要解压缩的确切块:

\d _timescaledb_internal._hyper_1_1_chunk
           Table "_timescaledb_internal._hyper_1_1_chunk"
 Column |           Type           | Collation | Nullable | Default
--------+--------------------------+-----------+----------+---------
 time   | timestamp with time zone |           | not null |
 device | integer                  |           |          |
 value  | double precision         |           |          |
Indexes:
    "_hyper_1_1_chunk_hyper_device_time_idx" btree (device, "time" DESC)
    "_hyper_1_1_chunk_hyper_time_idx" btree ("time" DESC)
Check constraints:
    "constraint_1" CHECK ("time" >= '2016-12-29 01:00:00+01'::timestamp with time zone AND "time" < '2017-01-05 01:00:00+01'::timestamp with time zone)
    "constraint_2" CHECK (_timescaledb_internal.get_partition_hash(device) < 1073741823)
Triggers:
    compressed_chunk_insert_blocker BEFORE INSERT ON _timescaledb_internal._hyper_1_1_chunk FOR EACH ROW EXECUTE PROCEDURE _timescaledb_internal.chunk_dml_blocker()
Inherits: hyper

查询结果为:

SELECT ch.chunk_name
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
      WHERE i.compression_status = 'Compressed') ch
  JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
  JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
  JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

此语句可以变成一个函数,该函数将使用 chunk_name ----------------------------------------- _timescaledb_internal._hyper_3_16_chunk (1 row) time的值作为输入。

最后一个回答问题的查询,现在只需修改查询以调用decompress_chunk:

device

插入将成功工作:

SELECT decompress_chunk(ch.chunk_name)
FROM (SELECT c.chunk_name
      FROM (SELECT show_chunks('hyper') AS chunk_name
           EXCEPT (SELECT show_chunks('hyper', older_than => '2018-07-02 06:01'::timestamptz))
           EXCEPT (SELECT show_chunks('hyper', newer_than => '2018-07-02 06:01'::timestamptz))) AS c
        JOIN timescaledb_information.compressed_chunk_stats i ON i.chunk_name = c.chunk_name
WHERE i.compression_status = 'Compressed') ch
JOIN _timescaledb_catalog.chunk cc ON chunk_name::text = schema_name||'.'||table_name
JOIN _timescaledb_catalog.chunk_constraint ON cc.id = chunk_id
JOIN _timescaledb_catalog.dimension_slice ds ON dimension_slice_id = ds.id
WHERE range_start <= _timescaledb_internal.get_partition_hash(12) 
    AND range_end > _timescaledb_internal.get_partition_hash(12);

回填用例:如果插入是回填数据的一部分,则timescaledb-extras project中有一个过程INSERT INTO hyper VALUES ('2018-07-02 06:01', 12, 5.1); -- INSERT 0 1 ,该过程将解压缩必要的块并回填数据。源表。

请注意,由于使用内部目录,回答该问题的查询可能无法在新版本的TimescaleDB中工作。

我不知道仅使用公共接口是否可以实现相同的目标。