使用groupBy的subQuery比通过连接查询慢得多

时间:2016-06-16 18:33:32

标签: performance postgresql sequelize.js

我使用sequelize在我的postgres数据库上运行一些查询。由于我正在做的分页,我发现我必须使用主查询的主键的子查询和组来查询。虽然这解决了我没有得到整页结果的问题,但查询速度要慢得多(3200ms vs 60ms)。遗憾的是,我不是SQL的专家,无法识别我能做些什么来加速它以使其具有良好的性能。

我正在运行的续集查询是:

var query = {
    limit: 10,
    where: {},
    include: [{model: db.FinancialCompany, through:{where:{address_zip:req.query.zip}}, required:true}, {model: db.Disclosure, required: false}],
    order: [['last_name', 'ASC']],
    groupBy: ['FinancialProfessional.id'],
    subQuery: true
  }
  db.FinancialProfessional.findAndCount(
      query
  ).then(function (professionals) {
    res.jsonp(professionals);
    return professionals;
  })

转换为

SELECT "FinancialProfessional".*,
       "FinancialCompanies"."id" AS "FinancialCompanies.id",
       "FinancialCompanies"."name" AS "FinancialCompanies.name",
       "FinancialCompanies"."address_street" AS "FinancialCompanies.address_street",
       "FinancialCompanies"."address_city" AS "FinancialCompanies.address_city",
       "FinancialCompanies"."address_state" AS "FinancialCompanies.address_state",
       "FinancialCompanies"."address_zip" AS "FinancialCompanies.address_zip",
       "FinancialCompanies"."crd" AS "FinancialCompanies.crd",
       "FinancialCompanies"."createdAt" AS "FinancialCompanies.createdAt",
       "FinancialCompanies"."updatedAt" AS "FinancialCompanies.updatedAt",
       "FinancialCompanies.ProfessionalToCompany"."address_street" AS "FinancialCompanies.ProfessionalToCompany.address_street",
       "FinancialCompanies.ProfessionalToCompany"."address_city" AS "FinancialCompanies.ProfessionalToCompany.address_city",
       "FinancialCompanies.ProfessionalToCompany"."address_state" AS "FinancialCompanies.ProfessionalToCompany.address_state",
       "FinancialCompanies.ProfessionalToCompany"."address_zip" AS "FinancialCompanies.ProfessionalToCompany.address_zip",
       "FinancialCompanies.ProfessionalToCompany"."createdAt" AS "FinancialCompanies.ProfessionalToCompany.createdAt",
       "FinancialCompanies.ProfessionalToCompany"."updatedAt" AS "FinancialCompanies.ProfessionalToCompany.updatedAt",
       "FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId" AS "FinancialCompanies.ProfessionalToCompany.FinancialCompanyId",
       "FinancialCompanies.ProfessionalToCompany"."FinancialProfessionalId" AS "FinancialCompanies.ProfessionalToCompany.FinancialProfessionalId",
       "Disclosures"."id" AS "Disclosures.id",
       "Disclosures"."info" AS "Disclosures.info",
       "Disclosures"."createdAt" AS "Disclosures.createdAt",
       "Disclosures"."updatedAt" AS "Disclosures.updatedAt",
       "Disclosures"."FinancialProfessionalId" AS "Disclosures.FinancialProfessionalId",
       "Disclosures"."RegulatoryAgencyId" AS "Disclosures.RegulatoryAgencyId"
FROM
  (SELECT "FinancialProfessional"."id",
          "FinancialProfessional"."full_name",
          "FinancialProfessional"."last_name",
          "FinancialProfessional"."alternate_names",
          "FinancialProfessional"."title",
          "FinancialProfessional"."crd",
          "FinancialProfessional"."licensed",
          "FinancialProfessional"."display_count",
          "FinancialProfessional"."years_f",
          "FinancialProfessional"."years_s",
          "FinancialProfessional"."createdAt",
          "FinancialProfessional"."updatedAt",
          "FinancialProfessional"."UserId"
   FROM "FinancialProfessionals" AS "FinancialProfessional"
   WHERE
       (SELECT "ProfessionalToCompany"."FinancialCompanyId"
        FROM "ProfessionalToCompanies" AS "ProfessionalToCompany"
        INNER JOIN "FinancialCompanies" AS "FinancialCompany" ON "ProfessionalToCompany"."FinancialCompanyId" = "FinancialCompany"."id"
        WHERE ("FinancialProfessional"."id" = "ProfessionalToCompany"."FinancialProfessionalId"
               AND "ProfessionalToCompany"."address_zip" = '94596') LIMIT 1) IS NOT NULL
   GROUP BY "FinancialProfessional"."id"
   ORDER BY "FinancialProfessional"."last_name" ASC LIMIT 10) AS "FinancialProfessional"
INNER JOIN ("ProfessionalToCompanies" AS "FinancialCompanies.ProfessionalToCompany"
            INNER JOIN "FinancialCompanies" AS "FinancialCompanies" ON "FinancialCompanies"."id" = "FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId"
            AND "FinancialCompanies.ProfessionalToCompany"."address_zip" = '94596') ON "FinancialProfessional"."id" = "FinancialCompanies.ProfessionalToCompany"."FinancialProfessionalId"
LEFT OUTER JOIN "Disclosures" AS "Disclosures" ON "FinancialProfessional"."id" = "Disclosures"."FinancialProfessionalId"
ORDER BY "FinancialProfessional"."last_name" ASC;

对查询进行分析后会给我:

Nested Loop Left Join  (cost=17155066.40..17155166.22 rows=1 width=2423) (actual time=5098.656..5098.780 rows=12 loops=1)
  ->  Nested Loop  (cost=17155065.98..17155157.78 rows=1 width=2343) (actual time=5098.648..5098.736 rows=10 loops=1)
        ->  Nested Loop  (cost=17155065.69..17155149.94 rows=1 width=227) (actual time=5098.642..5098.702 rows=10 loops=1)
              ->  Limit  (cost=17155065.27..17155065.29 rows=10 width=161) (actual time=5098.618..5098.624 rows=10 loops=1)
                    ->  Sort  (cost=17155065.27..17158336.49 rows=1308489 width=161) (actual time=5098.617..5098.618 rows=10 loops=1)
                          Sort Key: "FinancialProfessional".last_name
                          Sort Method: top-N heapsort  Memory: 27kB
                          ->  Group  (cost=0.43..17126789.29 rows=1308489 width=161) (actual time=10.895..5096.539 rows=909 loops=1)
                                Group Key: "FinancialProfessional".id
                                ->  Index Scan using "FinancialProfessionals_pkey" on "FinancialProfessionals" "FinancialProfessional"  (cost=0.43..17123518.07 rows=1308489 width=161) (actual time=10.893..5095.345 rows=909 loops=1)
                                      Filter: ((SubPlan 1) IS NOT NULL)
                                      Rows Removed by Filter: 1314155
                                      SubPlan 1
                                        ->  Limit  (cost=0.71..12.76 rows=1 width=4) (actual time=0.003..0.003 rows=0 loops=1315064)
                                              ->  Nested Loop  (cost=0.71..12.76 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1315064)
                                                    ->  Index Scan using "ProfessionalToCompanies_pkey" on "ProfessionalToCompanies" "ProfessionalToCompany"  (cost=0.42..8.45 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=1315064)
                                                          Index Cond: ("FinancialProfessional".id = "FinancialProfessionalId")
                                                          Filter: ((address_zip)::text = '94596'::text)
                                                          Rows Removed by Filter: 1
                                                    ->  Index Only Scan using "FinancialCompanies_pkey" on "FinancialCompanies" "FinancialCompany"  (cost=0.29..4.30 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=909)
                                                          Index Cond: (id = "ProfessionalToCompany"."FinancialCompanyId")
                                                          Heap Fetches: 0
              ->  Index Scan using "ProfessionalToCompanies_pkey" on "ProfessionalToCompanies" "FinancialCompanies.ProfessionalToCompany"  (cost=0.42..8.45 rows=1 width=66) (actual time=0.006..0.006 rows=1 loops=10)
                    Index Cond: ("FinancialProfessionalId" = "FinancialProfessional".id)
                    Filter: ((address_zip)::text = '94596'::text)
        ->  Index Scan using "FinancialCompanies_pkey" on "FinancialCompanies"  (cost=0.29..7.82 rows=1 width=2116) (actual time=0.002..0.002 rows=1 loops=10)
              Index Cond: (id = "FinancialCompanies.ProfessionalToCompany"."FinancialCompanyId")
  ->  Index Scan using fp_d_id on "Disclosures"  (cost=0.42..8.44 rows=1 width=80) (actual time=0.003..0.003 rows=0 loops=10)
        Index Cond: ("FinancialProfessional".id = "FinancialProfessionalId")
Planning time: 0.644 ms
Execution time: 5098.873 ms

架构:

CREATE TABLE public."FinancialProfessionals"
(
  id integer NOT NULL DEFAULT nextval('"FinancialProfessionals_id_seq"'::regclass),
  full_name character varying(255),
  last_name character varying(255),
  alternate_names character varying(255)[],
  title character varying(255)[],
  crd integer,
  licensed boolean,
  "createdAt" timestamp with time zone NOT NULL,
  "updatedAt" timestamp with time zone NOT NULL,
  tsv tsvector,
  "UserId" integer,
  display_count integer DEFAULT 0,
  years_f integer,
  years_s integer,
  CONSTRAINT "FinancialProfessionals_pkey" PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
); 
CREATE INDEX last_name_idx
  ON public."FinancialProfessionals"
  USING btree
  (last_name COLLATE pg_catalog."default");
CREATE INDEX name_idx
  ON public."FinancialProfessionals"
  USING gin
  (tsv);
CREATE INDEX crd_idx
  ON public."FinancialProfessionals"
  USING btree
  (crd);

CREATE TABLE public."ProfessionalToCompanies"
(
  address_street character varying(255),
  address_city character varying(255),
  address_state character varying(255),
  address_zip character varying(255),
  "createdAt" timestamp with time zone NOT NULL,
  "updatedAt" timestamp with time zone NOT NULL,
  "FinancialProfessionalId" integer NOT NULL,
  "FinancialCompanyId" integer NOT NULL,
  CONSTRAINT "ProfessionalToCompanies_pkey" PRIMARY KEY ("FinancialProfessionalId", "FinancialCompanyId"),
  CONSTRAINT "ProfessionalToCompanies_FinancialCompanyId_fkey" FOREIGN KEY ("FinancialCompanyId")
      REFERENCES public."FinancialCompanies" (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE,
  CONSTRAINT "ProfessionalToCompanies_FinancialProfessionalId_fkey" FOREIGN KEY ("FinancialProfessionalId")
      REFERENCES public."FinancialProfessionals" (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE
)
WITH (
  OIDS=FALSE
);

CREATE INDEX zip_idx
  ON public."ProfessionalToCompanies"
  USING btree
  (address_zip COLLATE pg_catalog."default");

CREATE TABLE public."FinancialCompanies"
(
  id integer NOT NULL DEFAULT nextval('"FinancialCompanies_id_seq"'::regclass),
  name character varying(255),
  address_street character varying(255),
  address_city character varying(255),
  address_state character varying(255),
  address_zip character varying(255),
  crd integer,
  "createdAt" timestamp with time zone NOT NULL,
  "updatedAt" timestamp with time zone NOT NULL,
  company_name_tsv tsvector,
  years_f integer,
  CONSTRAINT "FinancialCompanies_pkey" PRIMARY KEY (id)
)
WITH (
  OIDS=FALSE
);
CREATE INDEX company_name_idx
  ON public."FinancialCompanies"
  USING gin
  (company_name_tsv);

CREATE TABLE public."Disclosures"
(
  id integer NOT NULL DEFAULT nextval('"Disclosures_id_seq"'::regclass),
  info text,
  "createdAt" timestamp with time zone NOT NULL,
  "updatedAt" timestamp with time zone NOT NULL,
  "FinancialProfessionalId" integer,
  "RegulatoryAgencyId" integer,
  CONSTRAINT "Disclosures_pkey" PRIMARY KEY (id),
  CONSTRAINT "Disclosures_FinancialProfessionalId_fkey" FOREIGN KEY ("FinancialProfessionalId")
      REFERENCES public."FinancialProfessionals" (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE SET NULL,
  CONSTRAINT "Disclosures_RegulatoryAgencyId_fkey" FOREIGN KEY ("RegulatoryAgencyId")
      REFERENCES public."RegulatoryAgencies" (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE SET NULL
)
WITH (
  OIDS=FALSE
);
CREATE INDEX fp_d_id
  ON public."Disclosures"
  USING btree
  ("FinancialProfessionalId");
CREATE INDEX fp_r_id
  ON public."Disclosures"
  USING btree
  ("RegulatoryAgencyId");

FWIW以下查询在大约64ms内运行

SELECT fp.full_name, array_agg(ptc), array_agg(d)
FROM
  "ProfessionalToCompanies"      ptc
    JOIN "FinancialCompanies"     fc ON ptc."FinancialCompanyId" = fc.id
    JOIN "FinancialProfessionals" fp ON fp.id = ptc."FinancialProfessionalId"
    LEFT OUTER JOIN "Disclosures"  d ON fp.id = d."FinancialProfessionalId"
WHERE ptc.address_zip = '94596'
GROUP BY fp.id
ORDER BY fp.last_name ASC
limit 10

我可以添加某种索引或者某些东西可以使这个查询具有高效性吗?

1 个答案:

答案 0 :(得分:1)

明显的候选指数将符合您的订购标准。通过这种方式,PostgreSQL可以按顺序对索引执行嵌套循环,直到满足限制条件。这肯定会有所帮助。

但要小心。如果由于其他标准而有许多记录必须被跳过,那么这样的索引可能会表现得相当糟糕。

修改

在看到解释分析部分时,让我印象深刻的是你看到子查询的嵌套循环在大多数情况下都没有检索到任何结果,并且运行了130万次。这实际上占据了您报告分组时间的大部分时间。实际排序非常快,因为此时几乎没有行。也许按顺序尝试last_name和id的索引?

此时我并不完全确定。另请检查您的GEQO设置。

<强> EDIT2

我在阅读分析结果时遇到的问题是,您必须在where子句中使用的子查询中进行聚合。这可以解释为什么使用subQuery会对性能产生负面影响。

然后你有限制,这使得PostgreSQL认为&#34;嘿,我可以在这里做一个嵌套循环,并且可能会更快,因为我可以在找到10行后停止&#34;但是当它通过嵌套循环时,它永远不会找到任何行,因此结果是一个非常糟糕的计划。

我没有看到通过ORM优化这一点的简单方法,而没有其他一些层。