一些表统计信息：

Question

在查询中引入ORDER BY子句会增加总时间，因为db必须执行额外的工作才能对结果集进行排序：

将生成的元组复制到一些临时内存中
对它们进行排序（希望在内存中，否则使用磁盘）
将结果流式传输到客户端

我想念的是为什么只从连接表中添加一列会产生如此不同的性能。

查询1

EXPLAIN ANALYZE
SELECT p.*
FROM product_product p
JOIN django_site d ON (p.site_id = d.id)
WHERE (p.active = true  AND p.site_id = 1 )
ORDER BY d.domain, p.ordering, p.name

查询计划

Sort  (cost=3909.83..3952.21 rows=16954 width=1086) (actual time=1120.618..1143.922 rows=16946 loops=1)
   Sort Key: django_site.domain, product_product.ordering, product_product.name
   Sort Method:  quicksort  Memory: 25517kB
   ->  Nested Loop  (cost=0.00..2718.86 rows=16954 width=1086) (actual time=0.053..87.396 rows=16946 loops=1)
         ->  Seq Scan on django_site  (cost=0.00..1.01 rows=1 width=24) (actual time=0.010..0.012 rows=1 loops=1)
               Filter: (id = 1)
         ->  Seq Scan on product_product  (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.036..44.138 rows=16946 loops=1)
               Filter: (product_product.active AND (product_product.site_id = 1))
 Total runtime: 1182.515 ms

查询2

与上述相同，但未按django_site.domain

排序

查询计划

 Sort  (cost=3909.83..3952.21 rows=16954 width=1066) (actual time=257.094..278.905 rows=16946 loops=1)
   Sort Key: product_product.ordering, product_product.name
   Sort Method:  quicksort  Memory: 25161kB
   ->  Nested Loop  (cost=0.00..2718.86 rows=16954 width=1066) (actual time=0.075..86.120 rows=16946 loops=1)
         ->  Seq Scan on django_site  (cost=0.00..1.01 rows=1 width=4) (actual time=0.015..0.017 rows=1 loops=1)
               Filter: (id = 1)
         ->  Seq Scan on product_product  (cost=0.00..2548.31 rows=16954 width=1066) (actual time=0.052..44.024 rows=16946 loops=1)
               Filter: (product_product.active AND (product_product.site_id = 1))
 Total runtime: 305.392 ms

This question可能是相关的。

编辑：添加了更多详细信息

           Table "public.product_product"
 Column       |          Type          |  
 -------------+------------------------+---------
 id                | integer                | not null default nextval('product_product_id_seq'::regclass)
 site_id           | integer                | not null
 name              | character varying(255) | not null
 slug              | character varying(255) | not null
 sku               | character varying(255) | 
 ordering          | integer                | not null
 [snip some columns ]

 Indexes:
    "product_product_pkey" PRIMARY KEY, btree (id)
    "product_product_site_id_key" UNIQUE, btree (site_id, sku)
    "product_product_site_id_key1" UNIQUE, btree (site_id, slug)
    "product_product_site_id" btree (site_id)
    "product_product_slug" btree (slug)
    "product_product_slug_like" btree (slug varchar_pattern_ops)


                  Table "public.django_site"
 Column |          Type          | 
--------+------------------------+----------
 id     | integer                | not null default nextval('django_site_id_seq'::regclass)
 domain | character varying(100) | not null
 name   | character varying(50)  | not null
Indexes:
    "django_site_pkey" PRIMARY KEY, btree (id)

Postgres版本 8.4

一些表统计信息：

# select count(*) from django_site;
 count 
-------
     1

# select count(*) from product_product;
 count 
-------
 17540

# select active, count(*) from product_product group by active;
 active | count 
--------+-------
 f      |   591
 t      | 16949

# select site_id, count(*) from product_product group by site_id;
 site_id | count 
---------+-------
       1 | 17540

Answer 1

测试用例

PostgreSQL 9.1。使用有限的资源测试数据库，但这种小案例的方式足够了。整理的区域设置将是相关的：

SHOW LC_COLLATE;

 de_AT.UTF-8

步骤1）重建原始测试环境

-- DROP TABLE x;
CREATE SCHEMA x;  -- test schema

-- DROP TABLE x.django_site;
CREATE TABLE x.django_site (
id serial primary key
,domain character varying(100) not null
,int_col int not null
);
INSERT INTO x.django_site values (1,'www.testsite.com/foodir/', 3);

-- DROP TABLE x.product;
CREATE TABLE x.product (
 id serial primary key
,site_id integer not null
,name character varying(255) not null
,slug character varying(255) not null
,sku character varying(255) 
,ordering integer not null
,active boolean not null
);

INSERT INTO x.product (site_id, name, slug, sku, ordering, active)
SELECT 1
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,repeat(chr((random() * 255)::int + 32), (random()*255)::int)
    ,i -- ordering in sequence
    ,NOT (random()* 0.5174346569119122)::int::bool
FROM generate_series(1, 17540) AS x(i);
-- SELECT ((591::float8 / 17540)* 0.5) / (1 - (591::float8 / 17540))
-- = 0.5174346569119122

CREATE INDEX product_site_id on x.product(site_id);

步骤2）分析

    ANALYZE x.product;
    ANALYZE x.django_site;

步骤3）重新排序BY random（）

-- DROP TABLE x.p;
CREATE TABLE x.p AS
SELECT *
FROM   x.product
ORDER  BY random();

ANALYZE x.p;

结果

EXPLAIN ANALYZE
    SELECT p.*
    FROM   x.p
    JOIN   x.django_site d ON (p.site_id = d.id)
    WHERE  p.active
    AND    p.site_id = 1
--    ORDER  BY d.domain, p.ordering, p.name
--    ORDER  BY p.ordering, p.name
--    ORDER  BY d.id, p.ordering, p.name
--    ORDER  BY d.int_col, p.ordering, p.name
--    ORDER  BY p.name COLLATE "C"
--    ORDER  BY d.domain COLLATE "C", p.ordering, p.name -- dvd's final solution

1）预分析（ - >位图索引扫描）
2）后分析（ - > seq扫描）
3）通过random（），ANALYZE

重新排序

ORDER  BY d.domain, p.ordering, p.name

1）总运行时间：1253.543 ms
2）总运行时间：1250.351 ms
3）总运行时间：1283.111 ms

ORDER  BY p.ordering, p.name

1）总运行时间：177.266 ms
2）总运行时间：174.556 ms
3）总运行时间：177.797 ms

ORDER  BY d.id, p.ordering, p.name

1）总运行时间：176.628 ms
2）总运行时间：176.811 ms
3）总运行时间：178.150 ms
计划程序显然因d.id在功能上依赖。

ORDER  BY d.int_col, p.ordering, p.name -- integer column in other table

1）总运行时间：242.218毫秒 - !!
2）总运行时间：245.234 ms
3）总运行时间： 254.581 ms
计划者显然错过d.int_col（NOT NULL）与功能相关。但是按整数列排序很便宜。

ORDER  BY p.name -- varchar(255) in same table

1）总运行时间：2259.171 ms - !!
2）总运行时间：2257.650 ms
3）总运行时间： 2258.282 ms
按（长）varchar或text列进行排序非常昂贵......

ORDER  BY p.name COLLATE "C"

1）总运行时间：327.516 ms - !!
2）总运行时间：325.103毫秒 3）总运行时间： 327.206 ms
...但如果没有区域设置，则不会那么昂贵。

将语言环境排除在外，按varchar列进行排序的速度并不快，但差不多。区域设置"C"实际上是“没有区域设置，只是按字节值排序”。我quote the manual：

如果您希望系统的行为就像没有语言环境支持一样，请使用特殊区域设置名称C，或等效POSIX。

总而言之，@ DVD选择了：

ORDER  BY d.domain COLLATE "C", p.ordering, p.name

... 3）总运行时间： 275.854 ms
应该这样做。

Answer 2

EXPLAIN ANALYZE的输出与排序操作完全相同，因此排序会产生差异。

在两个查询中，您都会返回product_product的所有行，但在第一种情况下，您按django_site列排序，因此必须另外检索django_site.domain，这需要额外费用。但不会解释这个巨大的差异。

product_product中的行的物理顺序很可能已根据列ordering，这使得案例2中的排序非常便宜，在案件1中的种类很昂贵。

“更多细节补充”之后：
它也相当昂贵，因此按character varying(100)排序，而不是按integer列排序。除了整数小得多之外，还有整理支持可以减慢你的速度。要进行验证，请尝试使用COLLATE "C"进行排序。详细了解collation support in the manual。如果你运行PostgreSQL 9.1。我现在看到，你有PostgreSQL 8.4。

显然，当您在django_site.domain上进行过滤时，查询输出中的所有行都具有相同的p.site_id = 1值。如果查询规划器更智能，它可能会跳过第一列以便开始排序。

你运行PostgreSQL 8.4。 9.1的查询规划器变得更加智能化。升级可能会改变这种情况，但我不能肯定地说。

要验证我关于物理排序的理论，您可以尝试使用随机顺序插入的行制作大表的副本，然后再次运行查询。像这样：

CREATE TABLE p AS
SELECT *
FROM   public.product_product
ORDER  BY random();

然后：

EXPLAIN ANALYZE
SELECT p.*
FROM   p
JOIN   django_site d ON (p.site_id = d.id)
WHERE  p.active
AND    p.site_id = 1
ORDER  BY d.domain, p.ordering, p.name;

有什么不同吗？ - ＆GT;显然这并没有解释它......

好的，为了测试varchar(100)是否有所不同，我重新创建了你的场景。请参阅separate answer with a detailed test case and benchmark。这个答案已经超载了。

总结一下：
事实证明，我的另一个解释是合适的。放缓的主要原因显然是根据locale (LC_COLLATE)按varchar(100)列进行排序。

我添加了一些解释和test case的链接。结果应该说明一切。

Answer 3

据我所知，你需要一些索引

在product_product上创建索引product_product_idx01（active，site_id）; 这可能会加快您的查询速度。
为什么按域名订购，这是无意义的查询

加入表中列的查询排序缓慢

查询1

查询计划

查询2

查询计划

编辑：添加了更多详细信息

一些表统计信息：

3 个答案:

测试用例

结果