Question

我最近一直在玩TimeScaleDB，但是我有点困惑，需要一些关于为什么我的查询运行缓慢的指针，或者验证这是否是timescaledb查询的典型性能。

我正在使用的数据集是某一特定日期的市场报价数据，为此，我已将约8400万条记录加载到我的超表中。

以下是我文件中数据类型的示例：

$ java -jar migrator-0.0.1-SNAPSHOT.jar --spring.profiles.active=[list of profiles] --migrator.source-username=XXX --migrator.source-password=XXX --migrator.target-username=XXX --migrator.target-password=XXX --spring.datasource.username=XXX --spring.datasource.password=XXX --migrator.model-path=[path to a specific input file]

我的表是这样创建的：

2018-12-03 00:00:00.000344+00:00,2181.T,2018-12-03,2179,56300,2180,59500

2018-12-03 00:00:00.000629+00:00,1570.T,2018-12-03,20470,555118,20480,483857

2018-12-03 00:00:00.000631+00:00,2002.T,2018-12-03,2403,30300,2404,30200

我创建了两个版本的超级表： tt1 ，其中有1分钟的数据块； tt30m 是30分钟。两个表都遵循上面的相同架构。我这样创建了超表：


CREATE TABLE tt1 (time        TIMESTAMPTZ           NOT NULL,
cusip       varchar(40)           NOT NULL,     
date        DATE                NULL,  
value       DOUBLE PRECISION,
value2      DOUBLE PRECISION,
value3      DOUBLE PRECISION,
value4      DOUBLE PRECISION);

在两个版本的超表中都对time和cusip列进行了索引。创建超表时，默认情况下对时间进行索引，而我使用以下

创建了cusip索引

SELECT create_hypertable('tt1', 'time', chunk_time_interval => interval '1 minute');

我的查询如下：

  CREATE INDEX ON tt1(cusip, time DESC);

以30分钟为块，查询需要25.969秒。这是它的查询计划：

EXPLAIN ANALYZE SELECT time_bucket('15 minutes', time) AS fifteen_min,
  cusip, COUNT(*)
  FROM tt1
  WHERE time > timestamp '2018-12-03 05:10:06.174704-05' - interval '3 hours'
  GROUP BY fifteen_min, cusip
  ORDER BY fifteen_min DESC;

使用1分钟的数据块，查询需要25.686秒。这是查询计划：

                                                                                 QUERY PLAN                                                                        



-------------------------------------------------------------------------------------------------------------------------------------------------------------------

----------

 Finalize GroupAggregate  (cost=1679944.84..1685344.84 rows=40000 width=40) (actual time=25770.209..25873.410 rows=305849 loops=1)

   Group Key: (time_bucket('00:15:00'::interval, tt30m."time")), tt30m.cusip

   ->  Gather Merge  (cost=1679944.84..1684544.84 rows=40000 width=40) (actual time=25770.181..25885.080 rows=305849 loops=1)

         Workers Planned: 1

         Workers Launched: 1

         ->  Sort  (cost=1678944.83..1679044.83 rows=40000 width=40) (actual time=12880.868..12911.917 rows=152924 loops=2)

               Sort Key: (time_bucket('00:15:00'::interval, tt30m."time")) DESC, tt30m.cusip

               Sort Method: quicksort  Memory: 25kB

               Worker 0:  Sort Method: external merge  Disk: 10976kB

               ->  Partial HashAggregate  (cost=1675387.29..1675887.29 rows=40000 width=40) (actual time=12501.381..12536.373 rows=152924 loops=2)

                     Group Key: time_bucket('00:15:00'::interval, tt30m."time"), tt30m.cusip

                     ->  Parallel Custom Scan (ChunkAppend) on tt30m  (cost=10680.22..1416961.58 rows=34456761 width=32) (actual time=0.020..7293.929 rows=24255398

 loops=2)

                           Chunks excluded during startup: 14

                           ->  Parallel Seq Scan on _hyper_2_753_chunk  (cost=0.00..116011.42 rows=4366426 width=17) (actual time=0.037..1502.121 rows=7423073 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_755_chunk  (cost=0.00..108809.26 rows=4095539 width=17) (actual time=0.017..1446.248 rows=6962556 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_754_chunk  (cost=0.00..107469.27 rows=4056341 width=17) (actual time=0.015..1325.638 rows=6895917 loop

s=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_756_chunk  (cost=0.00..99037.70 rows=3730381 width=17) (actual time=0.006..1206.708 rows=6341775 loops

=1)

                                 Filter: ("time" > '2018-12-03 02:10:06.174704'::timestamp without time zone)

                           ->  Parallel Seq Scan on _hyper_2_758_chunk  (cost=0.00..90757.67 rows=3421505 width=17) (actual time=0.017..1126.757 rows=5816675 loops

Time: 25968.520 ms (00:25.969)

基本上，我要寻找的是关于这是否是timescaledb的预期性能或是否有优化此查询的方法的一些指针？

我已经运行了timescaledb-tune工具，并接受了它建议的所有优化方法。我正在通过虚拟盒在linux vm上运行它。虚拟机具有20GB的RAM和250GB +的硬盘空间以及2个CPU。 Postgres版本是11.6，TimeScaleDB版本是1.5.0。此处的dump_meta_data的附加输出：dump meta data output

非常感谢您的答复：）

Answer 1

无论哪种情况，此查询看起来都需要在3小时内扫描所有记录，这就是在花时间，有一些方法可以加快这种处理速度，其中一个是这里的虚拟硬件可能会变慢，因为它需要大量的io，并且您的设备体积很小，并且可能会降低IO的速度，因此，较大的设备会有所帮助。改变块的大小将几乎没有影响，块的大小几乎不会影响这种查询，并且实际上，我建议使用较大的块，因为84m行的行并不多。另一个选择是，如果要运行的查询类型很多，则可以使用连续聚合为您预先计算该操作中的某些操作，这可以为您节省一些时间和cpu / memory / io问题。

TimeScaleDB-使用索引和块大小的不同变化的时间段查询速度很慢-我做错了什么吗？

1 个答案: