Question

我有一种情况，我希望获得线程列表中每个线程最受欢迎的评论。目前我正在for循环中进行查询，这可能是非常慢的。有没有办法消除这造成的大量查询？

使用Django的prefetch_related查询集方法是不可接受的，因为它可以检索与线程相关的所有注释（可能非常多）。这特别成问题，因为我每个帖子只需要一个评论（只有最受欢迎的评论）。

这是我的模型的简化版本（为简洁起见，删除了一堆不相关的信息）。

class Thread(models.Model):
    def description(self):
        """ Returns most popular post based on votes. """
        return self.posts.annotate(_popularity=models.Count('votes')).order_by('-_popularity')[0]

class Post(models.Model):
    thread = models.ForeignKey('Thread', related_name='posts')
    text = models.CharField(max_length=settings.MAX_POST_LENGTH)

class Vote(models.Model):
    post = models.ForeignKey('Post', related_name='votes')

获得所有描述的代码实际上是这样的。 threads是已经评估过的Thread个对象的查询集。

def descriptions(threads):
     for thread in threads:
         yield thread.description()

所以基本上我有一些线程，我希望获得一个包含每个线程最流行评论的列表。我希望用少于N个查询来执行此操作，其中N是线程数。

Answer 1

在我看来，你非常接近得到正确答案。

def handle_popular_posts(threads):
  most_popular_posts = Posts.objects                 \
        .filter(thread__id__in=threads)              \
        .annotate(_popularity=models.Count('votes')) \
        .order_by('-_popularity').select_related('thread')
  for post in most_popular_posts:
     #your_code_here...

我添加了.select_related('thread')因为我相信你会想要关于父线程的信息而没有select_related Django每次尝试访问超出的线程信息时都会创建一个新的查询ID。

这个查询应该非常有效，因为我在一个数百行的数据库上运行了类似的情况，并且需要 ~10ms 。虽然带有id的单个get在此数据库上需要 ~5ms 。

Answer 2

我相信至少有两种解决方案。一个（因为你正在使用Postgres）是使用distinct。另一种是下拉到原始sql。前者更简单，因此我将为其编写代码示例。

most_popular_posts = Post.objects.all().annotate(
    popularity=Count('votes__id', distinct=True)
).select_related('thread').distinct('thread_id').order_by(
    '-thread_id', '-popularity'
)

消除Django中的查询

2 个答案: