Question

我要做什么：

我有模型Topic和Entry。 Entry具有主题的ForeignKey。我需要列出主题（条件是用户在其中有条目）（在过去24小时内创建）。我还需要注释计数，它需要是用户写的最后一个条目之后创建的条目总数。（更全面地说，您可以考虑一个收件箱，其中有包含未读邮件数量的对话列表。）

这就是我想出的：

relevant_topics = (
    Entry.objects.filter(author=user, date_created__gte=time_threshold(hours=24))
    .values_list("topic__pk", flat=True)
    .order_by()
    .distinct()
)

qs = (
    Topic.objects.filter(pk__in=relevant_topics).annotate(
        latest=Max("entries__date_created", filter=Q(entries__author=user)),
        count=Count("entries", filter=Q(date_created__gte=F("latest__date_created"))),
    )
).values("title", "count")

哪个会抛出：

FieldError: Cannot resolve keyword 'date_created' into field. Join on 'latest' not permitted.

我真的不知道Django本身是否不支持我编写的内容，或者我的解决方案有问题。我以为可以使用.extra（）添加计数，但是我不知道该如何使用latest注释。我真的很感激任何能产生预期输出的查询。

参考数据集：

(assume the current user = Jack)

<User username: Jack>
<User username: John>

<Topic title: foo>
<Topic title: bar>
<Topic title: baz>

(Assume higher pk = created later.)

<Entry pk:1 topic:foo user:Jack>
<Entry pk:2 topic:foo user:Jack> (date_created in last 24 hours)
<Entry pk:3 topic:foo user:John> (date_created in last 24 hours)

<Entry pk:4 topic:bar user:Jack> (date_created in last 24 hours)

<Entry pk:5 topic:baz user:John> (date_created in last 24 hours)

Given the dataset, the output should only be:

<Topic:foo count:1>

编辑：

为了给您一个想法，这是一个可以产生正确输出的原始SQL解决方案：

    pk = user.pk
    threshold = time_threshold(hours=24)

    with connection.cursor() as cursor:
        cursor.execute(
            """
        select
          s.title,
          s.slug,
          s.count
        from
          (
            select
              tt.title,
              tt.slug,
              e.count,
              e.max_id
            from
              (
                select
                  z.topic_id,
                  count(
                    case when z.id > k.max_id then z.id end
                  ) as count,
                  k.max_id
                from
                  dictionary_entry z
                  inner join (
                    select
                      topic_id,
                      max(de.id) as max_id
                    from
                      dictionary_entry de
                    where
                      de.date_created >= %s
                      and de.author_id = %s
                    group by
                      author_id,
                      topic_id
                  ) k on k.topic_id = z.topic_id
                group by
                  z.topic_id,
                  k.max_id
              ) e
              inner join dictionary_topic tt on tt.id = e.topic_id
          ) s
        where
          s.count > 0
        order by
          s.max_id desc
        """,
            [threshold, pk],
        )
        # convert to dict
        columns = [col[0] for col in cursor.description]
        return [dict(zip(columns, row)) for row in cursor.fetchall()]

Answer 1

我重建了您的查询，希望我能正确理解您的目标。我遇到了同样的错误。似乎与SQL评估查询的方式有关。我将您的疑问改写如下：

    qs0 = Topic.objects.filter(
        entries__author=user, entries__date_created__gte=time_threshold(24)).annotate(
            latest=Max("entries__date_created")
        )
    qs1 = qs0.annotate(
        count=Count("entries", filter=Q(entries__date_created__gte=F("latest")))
        ).values("title", "count")

因此，我首先过滤掉“用户”中有条目的最新主题，并用最新条目的日期（qs0）对其进行注释，然后尝试使用所需的计数来对该查询进行注释。第一个查询执行了应该执行的操作；当我打印它或在列表中评估它时，结果对我来说似乎是正确的（我使用了模拟数据）。但是通过第二个查询，我得到以下错误消息：

aggregate functions are not allowed in FILTER
LINE 1: ...") FILTER (WHERE "dummy_entry"."date_created" >= (MAX("dummy...

在Internet上发掘告诉我，这可能与SQL处理WHERE的方式有关。我同时尝试了MySQL和PostgreSQL，都产生了错误。在我看来，第二个查询在语法上是正确的，但是由于第一个查询在输入第二个查询之前没有进行评估，所以错误就是这样发生的。

无论如何，通过使用以下代码代替第二个查询，尽管以非常难看的方式，我仍然能够获得所需的结果（同样，如果我对您的理解正确的话）。

    dict = {}
    for item in qs0:
        dict[item.pk] = [item.title, item.latest, 0]

    for entry in Entry.objects.all():
        if entry.date_created >= dict[entry.topic.pk][1]:
            dict[entry.topic.pk][2] += 1

我将qs0放在以pk为键的字典中，并手动对所有条目进行计数。

恐怕这是我能做的最好的。我真的希望有人能提出更优雅的解决方案！

阅读Krysotl的答案后进行编辑：

这不是最终答案，但也许有帮助。大多数情况下，无法在聚合函数之前使用WHERE，请参见Aggregate function in SQL WHERE-Clause。有时可以通过用SQL中的HAVING替换WHERE来修复。 Django能够处理原始SQL查询，请参见https://docs.djangoproject.com/en/3.0/ref/models/expressions/#raw-sql-expressions。所以我尝试了以下方法：

sql_command = '''SELECT entry.topic_id, topic.title, entry.date_created, COUNT(entry.id) AS id__count FROM entry
        INNER JOIN topic ON (entry.topic_id = topic.id) GROUP BY entry.topic_id, topic.title, entry.date_created
        HAVING entry.date_created > (SELECT MAX(U0.date_created) AS latest
        FROM entry U0 WHERE (U0.author_id = 1 AND U0.date_created >= '2020-04-09 16:31:48.407501+00:00'
        AND U0.topic_id = (entry.topic_id)) GROUP BY U0.topic_id)'''

    qs = Entry.objects.annotate(val=RawSQL(sql_command, ()))

换句话说：将GROUP BY放在WHERE前面，并用HAVING替换WHERE。不幸的是，它仍然给我错误。恐怕我没有足够的SQL专家来解决此问题，但这也许是前进的方向。

Answer 2

这可以通过数据库中的1个SQL查询来实现，

过滤相关的entries（重要的是OuterRef，将过滤器“转移”到topics）
通过entries将topic分组并使用count，然后
使用topics注释Subquery。

一个人可以在Django docs中找到有关该信息的信息。

对于您的情况，以下应会产生预期的结果。

from django.db.models import Count, IntegerField, OuterRef, Subquery

relevant_topics = (
    models.Entry.objects.filter(
        author=user, date_created__gte=time_threshold(24), topic=OuterRef("pk"),
    )
    .order_by()
    .values("topic")
    .annotate(Count("id"))
    .values("id__count")
)

qs = models.Topic.objects.annotate(
    entries_count=Subquery(relevant_topics, output_field=IntegerField())
).filter(entries_count__gt=0)

希望这会有所帮助：-）

编辑1：

我认为我误解了这个问题，忘了考虑一个事实，那就是其他作者的entries（在当前作者的最后一位之后）需要被计算在内。

因此，我想出了以下内容，其结果与answer of @Paul Rene相同：

latest_in_topic = (
    Entry.objects.filter(author=user, date_created__gte=time_threshold(24), topic=OuterRef("topic"))
    .values("topic")
    .annotate(latest=Max("date_created"))
)

qs = (
    Entry.objects.annotate(
        latest=Subquery(latest_in_topic.values("latest"), output_field=DateTimeField())
    )
    .filter(date_created__gte=F("latest"))
    .values("topic", "topic__title")
    .annotate(Count("id"))
)

res = [(t["topic__title"], t["id__count"]) for t in qs]

修改2： ORM产生以下查询（由str(qs.query)获得）。也许会有一些线索来提高性能。

SELECT "entry"."topic_id", "topic"."title", COUNT("entry"."id") AS "id__count"
FROM "entry"
         INNER JOIN "topic" ON ("entry"."topic_id" = "topic"."id")
WHERE "entry"."date_created" > (SELECT MAX(U0."date_created") AS "latest"
                                    FROM "entry" U0
                                    WHERE (U0."author_id" = 1 AND U0."date_created" >= '2020-04-09 16:31:48.407501+00:00' AND U0."topic_id" = ("entry"."topic_id"))
                                    GROUP BY U0."topic_id")
GROUP BY "entry"."topic_id", "topic"."title";

在后续注释中使用注释值会引发FieldError

2 个答案: