Question

我正在一个包含很多测量数据的项目中。目标是将这些数据存储在数据库中。有5个大型数据数组（每个数组都有1百万个浮点数）。

我们将PostgreSQL 11.3用于数据库，我们认为在postgres中使用数组是一个好主意。到目前为止，保存和检索数据都可以正常工作，但是我们要构建一个小型Web应用程序，以图形形式显示这些值。当然，如此大的阵列是不切实际的，并且会使整个过程非常缓慢。因此，我们的想法是仅选择第10,000个值并将其发送。这足以绘制具有足够详细信息的简单图形。

那么有什么方法可以编写一个执行此操作的SQL查询？我们发现的唯一记录的功能是切片数组，但这只会从开始索引到结束索引中选择数据。或您是否有任何技巧可以解决此类问题。我们拥有数据库结构的完全自由，并且处于开发的早期阶段，因此创建新架构也将起作用。

这是到目前为止我们的表结构：

CREATE TABLE public."DataPoints"
(
    "Id" integer NOT NULL DEFAULT nextval('"DataPoints_Id_seq"'::regclass),
    "TLP_Voltage" double precision NOT NULL,
    "Delay" double precision NOT NULL,
    "Time_Resolution" double precision NOT NULL,
    "Time_Values" double precision[] NOT NULL,
    "Voltage_Offset" double precision NOT NULL,
    "Voltage_Resolution" double precision NOT NULL,
    "Voltage_Values" double precision[] NOT NULL,
    "Current_Offset" double precision NOT NULL,
    "Current_Resolution" double precision NOT NULL,
    "Current_Values" double precision[] NOT NULL,
    "Aux_1_Offset" double precision,
    "Aux_1_Resolution" double precision,
    "Aux_1_Values" double precision[],
    "Aux_2_Offset" double precision,
    "Aux_2_Resolution" double precision,
    "Aux_2_Values" double precision[],
    "Measurement_Id" integer NOT NULL,
    "Sequence_Id" integer NOT NULL,
    CONSTRAINT "DataPoints_pkey" PRIMARY KEY ("Id"),
    CONSTRAINT "DataPoints_Measurement_Id_fkey" FOREIGN KEY ("Measurement_Id")
        REFERENCES public."Measurements" ("Id") MATCH SIMPLE
        ON UPDATE NO ACTION
        ON DELETE NO ACTION
)

Answer 1

一种方法是取消聚合并重新聚合：

select (select array_agg(x.a)
        from unnest(v.ar) with ordinality x(a, n)
        where x.n % 1000 = 1
       )
from v;

Answer 2

您还可以使用generate_series。

create table test_array (c1 int[]);
insert into test_array (c1) VALUES (ARRAY[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]);

select x, c1[x]
FROM test_array,
-- Get every third element.  Change 3 to whatever the step should be.
generate_series(1, array_length(c1, 1), 3) as g(x);
x  | c1
----+----
  1 |  1
  4 |  4
  7 |  7
 10 | 10
 13 | 13
(5 rows)

编辑：经过一点测试，看来戈登的解决方案要快得多，这很有意义。

-- Create a 1 million element array
insert into test_array(c1) select array_agg(x) from generate_series(1,1000000) g(x);

-- My approach with generate_series:

explain analyze select x, c1[x] FROM test_array, generate_series(1, array_length(c1, 1), 1000) as g(x);
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.01..27223.60 rows=1360000 width=8) (actual time=3.929..910.291 rows=1000 loops=1)
   ->  Seq Scan on test_array  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.032 rows=1 loops=1)
   ->  Function Scan on generate_series g  (cost=0.01..10.01 rows=1000 width=4) (actual time=1.378..9.647 rows=1000 loops=1)
 Planning Time: 0.063 ms
 Execution Time: 919.515 ms
(5 rows)

-- Gordon's approach using unnest with ordinality
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2077.20 rows=1360 width=4) (actual time=109.685..246.758 rows=1000 loops=1)
   ->  Seq Scan on test_array  (cost=0.00..23.60 rows=1360 width=32) (actual time=0.035..0.049 rows=1 loops=1)
   ->  Function Scan on unnest x  (cost=0.00..1.50 rows=1 width=4) (actual time=109.603..233.817 rows=1000 loops=1)
         Filter: ((n % '1000'::bigint) = 1)
         Rows Removed by Filter: 999000
 Planning Time: 0.131 ms
 Execution Time: 256.515 ms

选择PostgeSQL数组的第n个元素

2 个答案: