Question

在聚合查询集时，我注意到如果以前使用注释，则会得到错误的结果。我不明白为什么。

代码

from django.db.models import QuerySet, Max, F, ExpressionWrapper, DecimalField, Sum
from orders.models import OrderOperation

class OrderOperationQuerySet(QuerySet):
    def last_only(self) -> QuerySet:
        return self \
            .annotate(last_oo_pk=Max('order__orderoperation__pk')) \
            .filter(pk=F('last_oo_pk'))

    @staticmethod
    def _hist_price(orderable_field):
        return ExpressionWrapper(
            F(f'{orderable_field}__hist_unit_price') * F(f'{orderable_field}__quantity'),
            output_field=DecimalField())

    def ordered_articles_data(self):
        return self.aggregate(
            sum_ordered_articles_amounts=Sum(self._hist_price('orderedarticle')))

测试

qs1 = OrderOperation.objects.filter(order__pk=31655)
qs2 = OrderOperation.objects.filter(order__pk=31655).last_only()
assert qs1.count() == qs2.count() == 1 and qs1[0] == qs2[0]  # shows that both querysets contains the same object

qs1.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.72')}  # expected result

qs2.ordered_articles_data()
> {'sum_ordered_articles_amounts': Decimal('3.01')}  # wrong result

这种last_only注释方法如何使聚合结果不同（和错误）？

“有趣”的事情似乎只有在订单包含具有相同hist_price的商品时才会发生：

旁注

我可以确认Django ORM创建的SQL可能是错误的，因为当我强制执行last_only()然后在第二个查询中调用聚合时，它会按预期工作。
https://docs.djangoproject.com/en/1.11/topics/db/aggregation/#combining-multiple-aggregations可能是一个解释吗？

SQL查询 （请注意，这些是实际的查询，但是上面的代码已稍作简化，这解释了下面COALESCE和"deleted" IS NULL的出现。）

-qs1.ordered_articles_data()

SELECT
    COALESCE(
        SUM(
            ("orders_orderedarticle"."hist_unit_price" * "orders_orderedarticle"."quantity")
        ),
        0) AS "sum_ordered_articles_amounts"
FROM "orders_orderoperation"
    LEFT OUTER JOIN "orders_orderedarticle"
        ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)

-qs2.ordered_articles_data()

SELECT COALESCE(SUM(("__col1" * "__col2")), 0)
FROM (
    SELECT
        "orders_orderoperation"."id" AS Col1,
        MAX(T3."id") AS "last_oo_pk",
        "orders_orderedarticle"."hist_unit_price" AS "__col1",
        "orders_orderedarticle"."quantity" AS "__col2"
    FROM "orders_orderoperation" INNER JOIN "orders_order"
        ON ("orders_orderoperation"."order_id" = "orders_order"."id")
        LEFT OUTER JOIN "orders_orderoperation" T3
            ON ("orders_order"."id" = T3."order_id")
        LEFT OUTER JOIN "orders_orderedarticle"
            ON ("orders_orderoperation"."id" = "orders_orderedarticle"."order_operation_id")
    WHERE ("orders_orderoperation"."order_id" = 31655 AND "orders_orderoperation"."deleted" IS NULL)
    GROUP BY
        "orders_orderoperation"."id",
        "orders_orderedarticle"."hist_unit_price",
        "orders_orderedarticle"."quantity"
    HAVING "orders_orderoperation"."id" = (MAX(T3."id"))
) subquery

Answer 1

当您使用数据库语言（LayoutInflater）中的任何annotation时，都应按功能以外的所有字段进行分组，并且可以在子查询中看到它

GROUP BY
    "orders_orderoperation"."id",
    "orders_orderedarticle"."hist_unit_price",
    "orders_orderedarticle"."quantity"
HAVING "orders_orderoperation"."id" = (MAX(T3."id"))

结果，hist_unit_price和quantity相同的货物将被最大id过滤。因此，根据您的屏幕，具有条件排除了chocolate或cafe之一。

Answer 2

使用较小的联接分隔为子查询是一种解决方案，可以防止对子对象进行更多联接时出现问题，可能不需要不必要的独立集合的巨大笛卡尔积或对{{1}的复杂控制}子句中的SQL子句来自查询更多元素的贡献。

解决方案：子查询用于获取最后顺序操作的主键。一个简单的没有添加联接或组的查询通常不会因为子级上的可能聚集而扭曲。

GROUP BY

测试

    def last_only(self) -> QuerySet:
        max_ids = (self.values('order').order_by()
                   .annotate(last_oo_pk=Max('order__orderoperation__pk'))
                   .values('last_oo_pk')
                   )
        return self.filter(pk__in=max_ids)

执行的SQL ：（通过删除应用名称前缀ret = (OrderOperationQuerySet(OrderOperation).filter(order__in=[some_order]) .last_only().ordered_articles_data())和双引号order_简化）

可以通过将SELECT CAST(SUM((orderedarticle.hist_unit_price * orderedarticle.quantity)) AS NUMERIC) AS sum_ordered_articles_amounts FROM orderoperation LEFT OUTER JOIN orderedarticle ON (orderoperation.id = orderedarticle.order_operation_id) WHERE ( orderoperation.order_id IN (31655) AND orderoperation.id IN ( SELECT MAX(U2.id) AS last_oo_pk FROM orderoperation U0 INNER JOIN order U1 ON (U0.order_id = U1.id) LEFT OUTER JOIN orderoperation U2 ON (U1.id = U2.order_id) WHERE U0.order_id IN (31655) GROUP BY U0.order_id ) )添加到orders_orderedarticle".id来修复原始无效SQL，但前提是必须同时使用GROUP BY和last_only()。那不是可读的方法。

Django：使用注释后，聚合返回错误结果

2 个答案: