Question

我正试着Riot API challenge，我试图将Django用作PythonAnywhere.com托管的后端。

我已经设置了一个使用类似于下面结构的数据库

class MatchDetails(models.Model):
    # Data fields

class Participant(models.Model):
    match = models.ForeignKey(MatchDetails)
    # Data fields

class Timeline(models.Model):
    participant = models.ForeignKey(Participant)
    # Data fields

# More fields, most with MatchDetails as foreign key.

我编写了一个检索和存储数据的功能，现在我已经存储了近4万个游戏，每个游戏有10个参与者。我的目标是从这些数据中提取一些统计数据，我基本上做了类似的事情：

allMatches = MatchDetails.objects.all()
for m in allMatches:
    participants = m.participant_set.all()
    for p in participants:
        # Increment some values
# save the result to the database

目前需要2个多小时。

2015-04-11 03:47:35 - 完成任务，耗时7942.00秒，返回代码是0。

这是一段荒谬的时间，不是吗？我有办法加快速度吗？

我尝试使用迭代器，我也尝试使用.value_list和.all.values（）来迭代，但我无法通过这种方式获取通过外键相关的对象。

How do I speed up iteration of a large dataset in Django

使用value_list时，有什么方法可以访问我的外键对象吗？或者我还能做些什么来加快速度？任何指针将不胜感激。

感谢阅读！

Answer 1

此时的最佳优化是使用prefetch_related()：

allMatches = MatchDetails.objects.prefetch_related('participant_set')
for m in allMatches:
    for p in m.participant_set.all():
        # Increment some values
# save the result to the database

这会将您的查询数量从大约40 000减少到2。

Answer 2

您可以尝试使用prefetch related加快速度。

此外，在您的values_list中，您可以获得所需的相关对象所需的属性，例如＆＃34; foreign_key_relation_name__attribute＆＃34;。

使用＆＃34; iterator（）＆＃34;如果只迭代一次查询集，这也是一种提高速度的好方法。

你的＆＃34;如何将结果保存到数据库＆＃34;码？如果您使用update（）批量保存所有项目而不是逐个保存每个项目，那么您也将提高速度。

Answer 3

取决于两个因素 - 您正在更新的是什么，以及如果您正在运行Django 1.8 - 您不必遍历所有内容。

from django.db.models import F
m.participant_set.update(some_field=F('some_field')*10)

这会将参与者模型中的所有some_field更新为其当前值乘以10，这比迭代所有行并每行执行更新要快几个数量级。

值得记住的是，如果你覆盖了Participant.save（） - 方法，它将不会被调用，并且保存信号也不会被发送。

加速Django中大型QuerySet的迭代

3 个答案: