Question

我在优化SQL查询方面遇到了很大的问题，这种查询需要很长时间才能在具有约300,000行的一组数据上运行。

我正在stat_records表上运行查询，该表具有十进制value和日期时间recorded_at列。

我想找出以下任何时期的最大值和最小值：去年，过去6个月，过去3个月，过去1个月，过去2周。

我现在的操作方式是，针对上面指定的每个时间间隔分别运行以下SQL查询：

SELECT MIN("stat_records"."value")
FROM "stat_records"
   INNER JOIN "stats" ON "stats"."id" = "stat_records"."stat_id"
WHERE "stat_records"."object_id" = $1
  AND "stats"."identifier" = $2
  AND ("stat_records"."recorded_at" BETWEEN $3 AND $4)

[["object_id", 1],
 ["identifier", "usd"],
 ["recorded_at", "2018-10-15 20:10:58.418512"],
 ["recorded_at", "2018-12-15 20:11:59.351437"]]

表定义为：

create_table "stat_records", force: :cascade do |t|
  t.datetime "recorded_at"
  t.decimal "value"
  t.bigint "coin_id"
  t.bigint "object_id"
  t.index ["object_id"], name: "index_stat_records_on_object_id"
  t.index ["recorded_at", "object_id", "stat_id"], name: "for_upsert", unique: true
  t.index ["recorded_at", "stat_id"], name: "index_stat_records_on_recorded_at_and_stat_id", unique: true
  t.index ["recorded_at"], name: "index_stat_records_on_recorded_at"
  t.index ["stat_id"], name: "index_stat_records_on_stat_id"
  t.index ["value"], name: "index_stat_records_on_value"
end

但是，这种方法需要永远完成。我在stat_records和value列的recorded_at表上都有索引。

我在这里想念的是什么-我应该怎么做才能优化呢？

也许有更好的方法可以执行1个查询，然后让postgres为我做优化。

Answer 1

索引只能加快需要表较小部分（或排序）的查询。因此，您永远都无法期望索引能够在整个时间范围内更快地进行查询。

您的解决方案可能是物化视图。这样，您可以预聚合值，并且结果表要小得多，因此查询会更快。缺点是，实例化视图需要定期刷新，并且之间必须包含稍微陈旧的数据。

一个例子：

CREATE MATERIALIZED VIEW stats_per_month AS
SELECT stat_records.object_id, 
       stats.identifier
       date_trunc('month', stat_records.recorded_at) AS recorded_month,
       min(stat_records.value) AS minval
FROM stat_records
   INNER JOIN stats ON stats.id = stat_records.stat_id
GROUP BY stat_records.object_id, 
         stats.identifier
         date_trunc('month', stat_records.recorded_at);

如果您需要按月查询粒度，则只需从物化视图而不是原始表中查询。

您还可以使用混合解决方案，并在较小的范围内使用原始查询，在此范围内陈旧的数据可能会造成更大的伤害。使用recorded_at上的索引应该可以很快。

使用多个最小和最大范围优化SQL查询

1 个答案: