ORDER BY查询的性能问题不一致

时间:2013-09-25 20:36:09

标签: python django orm django-models indexing

我有一个Django应用程序连接到PostgreSQL 9.1.9数据库,该数据库在自己的专用机器上运行,内存为2GB。数据库存储Twitter推文的缓存(大约100万个),并根据它们包含的单词对它们进行索引。以下是2个相关模型:

class TwitterPassage(models.Model):
    third_party_id = models.CharField(max_length=STANDARD_MAX_LEN, db_index=True, unique=True)
    third_party_created = models.DateTimeField(null=True, db_index=True)
    source = models.CharField(max_length=STANDARD_MAX_LEN)
    text = models.CharField(max_length=STANDARD_MAX_LEN)
    author = models.CharField(max_length=STANDARD_MAX_LEN)
    words = models.ManyToManyField('connectr.Word')
    quality = models.BigIntegerField(null=True, blank=True, db_index=True)
    author_fk = models.ForeignKey('connectr.TwitterUser', null=True)

class Word(models.Model):
    word = models.CharField(max_length=STANDARD_MAX_LEN, db_index=True, unique=True)
    display_word = models.CharField(max_length=STANDARD_MAX_LEN, default='', blank=True)
    passage_count = models.IntegerField(null=True, db_index=True, blank=True)

class User(models.Model):
    user_id = models.CharField(max_length=STANDARD_MAX_LEN, db_index=True)
    tweet_passages = models.ManyToManyField('connectr.TwitterPassage', through='connectr.PassageViewEvent')

Word与任何包含该词的TwitterPassage有很多关系。

我运行的查询是:

# Exclude tweets this user has already seen, and find the 20 highest quality tweets they haven't yet seen
word.twitterpassage_set.exclude(user=current_user).order_by('-quality')[:20]

质量是整数分数,范围从大约0到300。

会发生什么,有时这个查询很快就像我需要的那样(在一秒钟之内)。但有时候,它很慢 - 最多10秒。它似乎特别适用于那些非常常见的词,比如“他们的”或“我的”,而且对于与较少的TwitterPassages相关的罕见词语来说效果不太好。

我有8个字段索引的TwitterPassage模型和5个Word模型。这只是我需要更多RAM或更少索引的标志吗?我将如何确定哪些可能解决问题?

如果它有帮助,这里有一些关于数据库大小的信息:

                            relation                                |  size   
------------------------------------------------------------------------+---------
 public.connectr_twitterpassage_words_word_id                           | 1680 MB
 public.connectr_twitterpassage_twitterpassage_id_613c80271f09fba8_uniq | 1199 MB
 public.connectr_twitterpassage_words_pkey                              | 1010 MB
 public.connectr_twitterpassage_words                                   | 1009 MB
 public.connectr_twitterpassage_words_twitterpassage_id                 | 1002 MB
 public.connectr_twitterpassage                                         | 620 MB
 public.connectr_twitteruser                                            | 449 MB
 public.connectr_twitterpassage_created                                 | 256 MB
 public.connectr_passage_source_like                                    | 230 MB
 public.connectr_passage_source                                         | 229 MB
 public.connectr_twitterpassage_is_top_tweet                            | 194 MB
 public.connectr_passage_pkey                                           | 187 MB
 public.connectr_word                                                   | 184 MB
 public.connectr_passage_third_party_id_like                            | 181 MB
 public.connectr_passage_third_party_id                                 | 180 MB
 public.connectr_passage_retweet_count                                  | 170 MB
 public.connectr_twitterpassage_third_party_id_uniq                     | 168 MB
 public.connectr_passage_favorited_count                                | 166 MB
 public.connectr_twitterpassage_quality                                 | 159 MB
 public.connectr_twitterpassage_author_fk_id                            | 118 MB

编辑:根据Jakub的建议,这是查询的EXPLAIN ANALYZE:

 Limit  (cost=37918.71..37918.72 rows=20 width=204) (actual time=1495.133..1495.201 rows=20 loops=1)
   ->  Sort  (cost=37918.71..37919.01 rows=606 width=204) (actual time=1495.129..1495.156 rows=20 loops=1)
         Sort Key: connectr_twitterpassage.quality
         Sort Method: top-N heapsort  Memory: 24kB
         ->  Nested Loop  (cost=18.35..37915.49 rows=606 width=204) (actual time=0.301..1485.234 rows=1249 loops=1)
               ->  Index Scan using connectr_twitterpassage_words_word_id on connectr_twitterpassage_words  (cost=0.00..4905.80 rows=1212 width=4) (actual time=0.091..812.018 rows=1249 loops=1)
                     Index Cond: (word_id = 18890456)
               ->  Index Scan using connectr_passage_pkey on connectr_twitterpassage  (cost=18.35..27.23 rows=1 width=204) (actual time=0.515..0.525 rows=1 loops=1249)
                     Index Cond: (id = connectr_twitterpassage_words.twitterpassage_id)
                     Filter: ((NOT (hashed SubPlan 1)) OR (id IS NULL))
                     SubPlan 1
                       ->  Index Scan using connectr_passageviewevent_user_id on connectr_passageviewevent u1  (cost=0.00..18.34 rows=6 width=4) (actual time=0.033..0.091 rows=5 loops=1)
                             Index Cond: (user_id = 1)
                             Filter: (passage_id IS NOT NULL)
 Total runtime: 1495.700 ms
(15 rows)

对几个不同的单词运行上述查询后,某些单词非常快(~200ms),而其他单词则慢得多(~1500ms或更长)。如果我多次运行相同的查询,第二次它会更快(我猜它是缓存的?)。

以下是表格定义:

                                       Table "public.connectr_word"
       Column        |           Type           |                         Modifiers                          
---------------------+--------------------------+------------------------------------------------------------
 id                  | integer                  | not null default nextval('connectr_word_id_seq'::regclass)
 word                | character varying(10000) | not null
 created             | timestamp with time zone | not null
 modified            | timestamp with time zone | not null
 frequency           | double precision         | 
 is_username         | boolean                  | not null
 is_hashtag          | boolean                  | not null
 cloud_eligible      | boolean                  | not null
 passage_count       | integer                  | 
 avg_quality         | double precision         | 
 last_twitter_search | timestamp with time zone | 
 cloud_approved      | boolean                  | not null
 display_word        | character varying(10000) | not null
 is_trend            | boolean                  | not null
Indexes:
    "connectr_word_pkey" PRIMARY KEY, btree (id)
    "connectr_word_word_uniq" UNIQUE CONSTRAINT, btree (word)
    "connectr_word_avg_quality" btree (avg_quality)
    "connectr_word_cloud_eligible" btree (cloud_eligible)
    "connectr_word_last_twitter_search" btree (last_twitter_search)
    "connectr_word_passage_count" btree (passage_count)
    "connectr_word_word" btree (word)
Referenced by:
    TABLE "connectr_passageviewevent" CONSTRAINT "source_word_id_refs_id_178d46eb" FOREIGN KEY (source_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_wordmatchrewardevent" CONSTRAINT "tapped_word_id_refs_id_c2ffb369" FOREIGN KEY (tapped_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "word_id_refs_id_00cccde2" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_twitterpassage_words" CONSTRAINT "word_id_refs_id_64f49629" FOREIGN KEY (word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED


                                         Table "public.connectr_twitterpassage"
         Column         |           Type           |                              Modifiers                               
------------------------+--------------------------+----------------------------------------------------------------------
 id                     | integer                  | not null default nextval('connectr_twitterpassage_id_seq'::regclass)
 third_party_id         | character varying(10000) | not null
 source                 | character varying(10000) | not null
 text                   | character varying(10000) | not null
 author                 | character varying(10000) | not null
 raw_data               | character varying(10000) | not null
 created                | timestamp with time zone | not null
 modified               | timestamp with time zone | not null
 third_party_created    | timestamp with time zone | 
 retweet_count          | integer                  | not null
 favorited_count        | integer                  | not null
 lang                   | character varying(10000) | not null
 location               | character varying(10000) | not null
 author_followers_count | integer                  | not null
 is_retweet             | boolean                  | not null
 url                    | character varying(10000) | not null
 author_fk_id           | integer                  | 
 quality                | bigint                   | 
 is_top_tweet           | boolean                  | not null
Indexes:
    "connectr_passage_pkey" PRIMARY KEY, btree (id)
    "connectr_twitterpassage_third_party_id_uniq" UNIQUE CONSTRAINT, btree (third_party_id)
    "connectr_passage_author_followers_count" btree (author_followers_count)
    "connectr_passage_favorited_count" btree (favorited_count)
    "connectr_passage_retweet_count" btree (retweet_count)
    "connectr_passage_source" btree (source)
    "connectr_passage_source_like" btree (source varchar_pattern_ops)
    "connectr_passage_third_party_id" btree (third_party_id)
    "connectr_passage_third_party_id_like" btree (third_party_id varchar_pattern_ops)
    "connectr_twitterpassage_author_fk_id" btree (author_fk_id)
    "connectr_twitterpassage_created" btree (created)
    "connectr_twitterpassage_is_top_tweet" btree (is_top_tweet)
    "connectr_twitterpassage_quality" btree (quality)
    "connectr_twitterpassage_third_party_created" btree (third_party_created)
Foreign-key constraints:
    "author_fk_id_refs_id_074720a5" FOREIGN KEY (author_fk_id) REFERENCES connectr_twitteruser(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "connectr_passageviewevent" CONSTRAINT "passage_id_refs_id_892b36a6" FOREIGN KEY (passage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "twitter_from_id_refs_id_8adbab24" FOREIGN KEY (twitter_from_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_connection" CONSTRAINT "twitter_to_id_refs_id_8adbab24" FOREIGN KEY (twitter_to_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_twitterpassage_words" CONSTRAINT "twitterpassage_id_refs_id_720f772f" FOREIGN KEY (twitterpassage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED


                                           Table "public.connectr_user"
           Column           |           Type           |                         Modifiers                          
----------------------------+--------------------------+------------------------------------------------------------
 id                         | integer                  | not null default nextval('connectr_user_id_seq'::regclass)
 user_id                    | character varying(10000) | not null
 reference_name             | character varying(10000) | not null
 created                    | timestamp with time zone | 
 modified                   | timestamp with time zone | 
 score                      | integer                  | not null
 twitter_screen_name        | character varying(10000) | not null
 twitter_oauth_token        | character varying(10000) | not null
 twitter_oauth_token_secret | character varying(10000) | not null
 twitter_keys_last_used     | timestamp with time zone | not null
Indexes:
    "connectr_user_pkey" PRIMARY KEY, btree (id)
    "connectr_user_score" btree (score)
    "connectr_user_user_id" btree (user_id)
    "connectr_user_user_id_like" btree (user_id varchar_pattern_ops)
Referenced by:
    TABLE "connectr_connection" CONSTRAINT "user_id_refs_id_366cf6e8" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_passageviewevent" CONSTRAINT "user_id_refs_id_478f94a2" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_user_reddit_passages" CONSTRAINT "user_id_refs_id_488fdfea" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_wordmatchrewardevent" CONSTRAINT "user_id_refs_id_8a36f38a" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "connectr_user_book_passages" CONSTRAINT "user_id_refs_id_e830956b" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED

                                      Table "public.connectr_passageviewevent"
     Column     |           Type           |                               Modifiers                                
----------------+--------------------------+------------------------------------------------------------------------
 id             | integer                  | not null default nextval('connectr_passageviewevent_id_seq'::regclass)
 passage_id     | integer                  | not null
 user_id        | integer                  | not null
 source_word_id | integer                  | not null
 next_id        | integer                  | 
 connection_id  | integer                  | 
 date           | timestamp with time zone | not null
Indexes:
    "connectr_passageviewevent_pkey" PRIMARY KEY, btree (id)
    "connectr_passageviewevent_connection_id" btree (connection_id)
    "connectr_passageviewevent_date" btree (date)
    "connectr_passageviewevent_next_id" btree (next_id)
    "connectr_passageviewevent_passage_id" btree (passage_id)
    "connectr_passageviewevent_source_word_id" btree (source_word_id)
    "connectr_passageviewevent_user_id" btree (user_id)
Foreign-key constraints:
    "connection_id_refs_id_a3ff7fc2" FOREIGN KEY (connection_id) REFERENCES connectr_connection(id) DEFERRABLE INITIALLY DEFERRED
    "next_id_refs_id_f737727c" FOREIGN KEY (next_id) REFERENCES connectr_passageviewevent(id) DEFERRABLE INITIALLY DEFERRED
    "passage_id_refs_id_892b36a6" FOREIGN KEY (passage_id) REFERENCES connectr_twitterpassage(id) DEFERRABLE INITIALLY DEFERRED
    "source_word_id_refs_id_178d46eb" FOREIGN KEY (source_word_id) REFERENCES connectr_word(id) DEFERRABLE INITIALLY DEFERRED
    "user_id_refs_id_478f94a2" FOREIGN KEY (user_id) REFERENCES connectr_user(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "connectr_passageviewevent" CONSTRAINT "next_id_refs_id_f737727c" FOREIGN KEY (next_id) REFERENCES connectr_passageviewevent(id) DEFERRABLE INITIALLY DEFERRED

以下是查询的原始SQL(有时)很慢(由Django生成):

SELECT "connectr_twitterpassage"."id", "connectr_twitterpassage"."third_party_id", "connectr_twitterpassage"."third_party_created", "connectr_twitterpassage"."source", "connectr_twitterpassage"."text", "connectr_twitterpassage"."author", "connectr_twitterpassage"."raw_data", "connectr_twitterpassage"."retweet_count", "connectr_twitterpassage"."favorited_count", "connectr_twitterpassage"."lang", "connectr_twitterpassage"."location", "connectr_twitterpassage"."author_followers_count", "connectr_twitterpassage"."is_retweet", "connectr_twitterpassage"."url", "connectr_twitterpassage"."author_fk_id", "connectr_twitterpassage"."quality", "connectr_twitterpassage"."is_top_tweet", "connectr_twitterpassage"."created", "connectr_twitterpassage"."modified" 
    FROM "connectr_twitterpassage" INNER JOIN "connectr_twitterpassage_words" 
    ON ("connectr_twitterpassage"."id" = "connectr_twitterpassage_words"."twitterpassage_id") 
    WHERE ("connectr_twitterpassage_words"."word_id" = 19514309  
    AND NOT (("connectr_twitterpassage"."id" 
    IN (SELECT U1."passage_id" FROM "connectr_passageviewevent" U1 WHERE (U1."user_id" = 1  AND U1."passage_id" IS NOT NULL)) AND "connectr_twitterpassage"."id" IS NOT NULL))) 
    ORDER BY "connectr_twitterpassage"."quality" DESC LIMIT 20

添加这些索引后:

create index word_to_twitterpassage_id on connectr_twitterpassage_words (word_id,twitterpassage_id);
create index id_to_quality_sorted on connectr_twitterpassage (id,quality desc nulls last);

EXPLAIN ANALYZE现在是这样的:

 Limit  (cost=34679.26..34679.31 rows=20 width=206) (actual time=7.883..7.887 rows=20 loops=1)
   ->  Sort  (cost=34679.26..34681.02 rows=704 width=206) (actual time=7.882..7.884 rows=20 loops=1)
         Sort Key: connectr_twitterpassage.quality
         Sort Method: top-N heapsort  Memory: 32kB
         ->  Nested Loop  (cost=16.86..34660.53 rows=704 width=206) (actual time=2.669..7.618 rows=102 loops=1)
               ->  Index Only Scan using word_to_twitterpassage_id on connectr_twitterpassage_words  (cost=0.00..67.21 rows=1408 width=4) (actual time=2.493..3.094 rows=102 loops=1)
                     Index Cond: (word_id = 18860699)
                     Heap Fetches: 1
               ->  Index Scan using connectr_passage_pkey on connectr_twitterpassage  (cost=16.86..24.56 rows=1 width=206) (actual time=0.042..0.043 rows=1 loops=102)
                     Index Cond: (id = connectr_twitterpassage_words.twitterpassage_id)
                     Filter: ((NOT (hashed SubPlan 1)) OR (id IS NULL))
                     SubPlan 1
                       ->  Bitmap Heap Scan on connectr_passageviewevent u1  (cost=4.46..16.80 rows=27 width=4) (actual time=0.049..0.066 rows=25 loops=1)
                             Recheck Cond: (user_id = 1)
                             Filter: (passage_id IS NOT NULL)
                             ->  Bitmap Index Scan on connectr_passageviewevent_user_id  (cost=0.00..4.45 rows=27 width=0) (actual time=0.037..0.037 rows=26 loops=1)
                                   Index Cond: (user_id = 1)
 Total runtime: 8.042 ms
(18 rows)

1 个答案:

答案 0 :(得分:0)

如上所述,您的问题是由于缺少外键指数而导致基于其他条件的嵌套循环连接。添加这些索引可以解决您的问题。

通常在PostgreSQL中,您应始终在任何非平凡大小的表上索引所有外键(主键和唯一字段自动编入索引),然后添加您真正需要的任何其他索引。在这种情况下,您缺少外键索引。