从大型且不断增长的数据集中优化慢速django查询集

时间:2015-09-19 16:31:24

标签: python django performance postgresql django-queryset

我的页面加载速度太慢。不知怎的,我需要改进查询数据的方式(缓存?部分加载/页面等等)

注意我是一个django noob并且还没有完全围绕model.Managermodels.query.QuerySet,所以如果这个设置看起来很笨拙......

目前,页面加载查询集大约需要18秒,目前只有大约500条记录。每天平均会有大约100条新记录。

Network stats

数据库是Postgresql

缓慢的视图

def approvals(request):
    ...
    approved_submissions = QuestSubmission.objects.all_approved()
    ...

查询集

class QuestSubmissionQuerySet(models.query.QuerySet):
    ...

    def approved(self):
        return self.filter(is_approved=True)

    def completed(self):
         return self.filter(is_completed=True).order_by('-time_completed')

    ...

class QuestSubmissionManager(models.Manager):
    def get_queryset(self):
        return QuestSubmissionQuerySet(self.model, using=self._db)

    def all_approved(self, user=None):
        return self.get_queryset().approved().completed()

    ...

QuestSubmission.objects.all_approved()生成的SQL:

'SELECT "quest_manager_questsubmission"."id", "quest_manager_questsubmission"."quest_id", "quest_manager_questsubmission"."user_id", "quest_manager_questsubmission"."ordinal", "quest_manager_questsubmission"."is_completed", "quest_manager_questsubmission"."time_completed", "quest_manager_questsubmission"."is_approved", "quest_manager_questsubmission"."time_approved", "quest_manager_questsubmission"."timestamp", "quest_manager_questsubmission"."updated", "quest_manager_questsubmission"."game_lab_transfer" FROM "quest_manager_questsubmission" WHERE ("quest_manager_questsubmission"."is_approved" = True AND "quest_manager_questsubmission"."is_completed" = True) ORDER BY "quest_manager_questsubmission"."time_completed" DESC'

缓慢的模型

class QuestSubmission(models.Model):
    quest = models.ForeignKey(Quest)
    user = models.ForeignKey(settings.AUTH_USER_MODEL, related_name="quest_submission_user")
    ordinal = models.PositiveIntegerField(default = 1, help_text = 'indicating submissions beyond the first for repeatable quests')
    is_completed = models.BooleanField(default=False)
    time_completed = models.DateTimeField(null=True, blank=True)
    is_approved = models.BooleanField(default=False)
    time_approved = models.DateTimeField(null=True, blank=True)
    timestamp = models.DateTimeField(auto_now=True, auto_now_add=False)
    updated = models.DateTimeField(auto_now=False, auto_now_add=True)
    game_lab_transfer = models.BooleanField(default = False, help_text = 'XP not counted')

    class Meta:
        ordering = ["time_approved", "time_completed"]

    objects = QuestSubmissionManager()

    #other methods
    ....

有哪些策略可以解决此问题?我尝试使用django的Paginator,但它似乎只显示在页面中,但它仍然加载整个查询集。

3 个答案:

答案 0 :(得分:6)

首先要看的是:

  • 此查询是否因为返回非常大的结果集而变慢?

  • 这个查询是否因为过滤表格需要一段时间而变慢?

假设前者,除了“返回更少的数据”之外,你没有很多好的选择。

如果是后者,你可能应该在数据库上运行EXPLAIN,但是我会说你可能想要一个索引,可能在(is_approved, is_completed)上。可以通过以下方式完成:

class Meta:
    index_together = [
        ["is_completed", "is_approved"],
    ]

答案 1 :(得分:2)

如果您在页面中显示相关对象,请尝试使用 select_related()

  

如果没有select_related(),这将为每个进行数据库查询    循环迭代,以获取每个条目的相关博客。

答案 2 :(得分:2)

我将构建一个基础查询集来构建并应用所需的过滤器:

def approvals(request):
    ...
    approved_submissions = QuestSubmission.objects.select_related('quest', 'user').all_approved()
    ...