Django - 单个查询集(大数据集)中每个对象的最新相关状态记录

时间:2016-04-15 00:56:25

标签: python mysql sql django django-queryset

[编辑:使用Django 1.9& MySQL 5.6;没有DISTINCT ON关键字]

我有两个模型大致相当于以下内容:

class Vehicle(models.Model):
    vin = models.CharField(max_length=255)
    ...  # lots more not-interesting fields


class Status(models.Model):
    """The status of a vehicle at a moment in time"""
    vehicle = models.ForeignKey(Vehicle, related_name='status')
    code = models.CharField(max_length=20)
    time = models.DateTimeField()

        class Meta:
            order_by = ('time',)

如何使用单一查询来返回每辆车的当前状态?有数百辆车,数十万状态记录。

对每辆车进行循环并选择其最新状态对于车辆数​​量(数百)和状态(数十万)来说太慢了。

我尝试使用.annotate()和.values()来做到这一点;为什么这不起作用?我希望这会返回笛卡尔积 车辆和状态表,然后筛选出除最新状态之外的所有状态。

vehicles = Vehicle.objects.annotate(
    status_time=F('status__time'),
    status_time_latest=Max('status_time'),
    status_code=F('status__code'),
).filter(
    status_time=F('status_time_latest'),
).values()

相反,Django(1.9)似乎只返回每辆车的第一个状态代码(按ID排序)。

这是select_related()的用途,还是最终通过网络传输整个状态表?每次我需要运行此查询时,它都太大而无法转储;我宁愿将处理卸载到数据库服务器。

1 个答案:

答案 0 :(得分:2)

您可以混合使用order_bydistinct来实现您的目标:

vehicles = Vehicle.objects
               .annotate(status_time=F('status__time'), status_code=F('status__code'))
               .order_by('id', '-status_time').distinct('id')

分解:

# first annotate all vehicle objects with all the statuses
vehicles = Vehicle.objects.annotate(status_time=F('status__time'), status_code=F('status__code'))

# order by id, and in decreasing order or status_time
vehicles = vehicles.order_by('id', '-status_time')

# get distinct using id, this will make sure that the first entry for 
# each Vehicle is retained and since we ordered in decreasing order of
# status_time within each vehicle the first entry will have latest status
vehicles = vehicles.distinct('id')