Question

我有大约500万行的表

CREATE TABLE audit_log
(
  event_time timestamp with time zone NOT NULL DEFAULT now(),
  action smallint, -- 1:modify, 2:create, 3:delete, 4:security, 9:other
  level smallint NOT NULL DEFAULT 20, -- 10:minor, 20:info, 30:warning, 40:error
  component_id character varying(150),
  CONSTRAINT audit_log_pk PRIMARY KEY (audit_log_id)
)
WITH (
  OIDS=FALSE
);

我需要使用类似SELECT component_id from audit_log GROUP BY component_id的所有组件ID，并且需要 20秒才能完成查询。我该如何优化？

UPD ：

我在component_id上有索引

CREATE INDEX audit_log_component_id_idx
  ON audit_log
  USING btree
  (component_id COLLATE pg_catalog."default");

UPD 2 ：嗯，我知道一个解决方案是将组件名称移到单独的表中，但希望有一个更简单的解决方案。谢谢你们。

Answer 1

在列component_id

由于它是查询中使用的唯一列，因此您可以直接从索引访问信息。

您可能还想考虑将组件（当前是一个字符串）移动到一个单独的表中，通过类型为整数或类似的ID引用它。

Answer 2

为表创建非聚集索引（component_id）。或者为您在where类中使用的所有字段定义非集群。尝试查看执行时间差异或执行计划。赌注是将所有扫描转换为搜索操作。

Answer 3

如果您在另一个表中有一个有效组件ID的列表，并且只想检查它们在审计表中的存在，可选择某些条件，那么您可能会：

select
  component_id
from
  components
where
  exists (
    select null
    from   audit_log
    where  audit_log.component_id = components.component_id)

如果不同component_id的数量明显小于audit_log中的行数并且audit_log.component_id被编入索引，那么这将表现得更好。

PostgreSQL从500万行表中选择

3 个答案: