如何检索具有唯一属性值的前X个对象

时间:2011-07-02 12:58:20

标签: python django

在我的一个Django应用程序中,我正在为一个可以通过以下示例描述的问题寻找一个优雅且高性能的解决方案:

鉴于这些对象:

class Author(models.Model):
    name = models.CharField()

class Book(models.Model):
    collection = models.ForeignKey(Collection)
    publication = models.DateField()

class Collection(models.Model):
    name = models.CharField()
    author = models.ForeignKey(Author)

我想要检索4本(或任何其他小数目)最新出版的书籍,但我想要有4位不同的作者。这意味着如果2本最新出版的书籍来自同一作者,我只希望在我的前4位中获得一本并为其他作者留下3个位置。

我已经考虑过多个步骤执行此操作,检索最新的发布,然后逐个测试并存储作者值,如果它存在多次,我将检索更多最新发布...但这是在我的家里完成的页面,我需要这个代码尽可能高效。

任何帮助都将受到高度赞赏。 谢谢

2 个答案:

答案 0 :(得分:0)

这篇文章可能会回答你的问题

Django: Distinct foreign keys

答案 1 :(得分:0)

您可以使用annotateextraraw。以下是您使用annotatate的方式:

books = [a.book_set.latest('pub_date') for a in Author.objects
                   .annotate(latest=Max('book__pub_date'))
                   .order_by('-latest')[:5]]

假设作者没有多本具有相同pub_date的书籍,您可以使用extra这样:

sql = '''SELECT MAX(app_book.pub_date)
         FROM app_book
         WHERE app_book.author_id=app_author.id'''
latest = Author.objects.extra(
                select={'latest': sql},
                order_by=['-latest'])[:5].values_list('latest')
books = Book.objects.filter(pub_date__in=[x[0] for x in latest]).order_by('-pub_date')

如果您使用raw,则可以通过一个查询获取所有图书:

sql = '''SELECT * FROM app_book
         WHERE app_book.pub_date IN
           (SELECT MAX(app_book.pub_date)
            FROM app_book
            GROUP BY app_book.author_id)
         ORDER BY app_book.pub_date DESC'''
books = list(Book.objects.raw(sql)[:5])

我假设模特如下:

class Author(models.Model):
    name = models.CharField(max_length=50)

class Book(models.Model):
    title = models.CharField(max_length=50)
    author = models.ForeignKey(Author)
    pub_date = models.DateTimeField()

    class Meta:
        get_latest_by = 'pub_date'

为了好玩,我以为我会对这三种方法进行基准测试(使用一个装有大约10万个虚拟书籍的数据库):

>>> %time annotate()
(0.274) SELECT "app_author"."id", "app_author"."name", MAX("app_book"."pub_date") AS "latest" FROM "app_author" LEFT OUTER JOIN "app_book" ON ("app_author"."id" = "app_book"."author_id") GROUP BY "app_author"."id", "app_author"."name", "app_author"."id", "app_author"."name" ORDER BY "latest" DESC LIMIT 5; args=()
(0.035) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."author_id" = 10  ORDER BY "app_book"."pub_date" DESC LIMIT 1; args=(10,)
(0.036) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."author_id" = 9  ORDER BY "app_book"."pub_date" DESC LIMIT 1; args=(9,)
(0.036) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."author_id" = 8  ORDER BY "app_book"."pub_date" DESC LIMIT 1; args=(8,)
(0.036) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."author_id" = 7  ORDER BY "app_book"."pub_date" DESC LIMIT 1; args=(7,)
(0.040) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."author_id" = 6  ORDER BY "app_book"."pub_date" DESC LIMIT 1; args=(6,)
CPU times: user 0.32 s, sys: 0.15 s, total: 0.47 s
Wall time: 0.47 s
<<< [<Book: Susan>, <Book: Yasmin>, <Book: Carl>, <Book: Benny>, <Book: George>]

>>> %time extra()
(0.445) SELECT (SELECT MAX(app_book.pub_date)
             FROM app_book
             WHERE app_book.author_id=app_author.id) AS "latest" FROM "app_author" ORDER BY "latest" DESC LIMIT 5; args=()
(0.045) SELECT "app_book"."id", "app_book"."title", "app_book"."author_id", "app_book"."pub_date" FROM "app_book" WHERE "app_book"."pub_date" IN (2038-11-25 11:33:30.425836, 2038-11-24 11:33:30.424598, 2038-11-23 11:33:30.423435, 2038-11-22 11:33:30.422227, 2038-11-21 11:33:30.421045) ORDER BY "app_book"."pub_date" DESC; args=(u'2038-11-25 11:33:30.425836', u'2038-11-24 11:33:30.424598', u'2038-11-23 11:33:30.423435', u'2038-11-22 11:33:30.422227', u'2038-11-21 11:33:30.421045')
CPU times: user 0.32 s, sys: 0.18 s, total: 0.50 s
Wall time: 0.50 s
<<< [<Book: Susan>, <Book: Yasmin>, <Book: Carl>, <Book: Benny>, <Book: George>]

>>> %time raw()
(0.279) SELECT * FROM app_book
             WHERE app_book.pub_date IN
               (SELECT MAX(app_book.pub_date)
                FROM app_book
                GROUP BY app_book.author_id)
            ORDER BY app_book.pub_date DESC; args=()
CPU times: user 0.19 s, sys: 0.09 s, total: 0.28 s
Wall time: 0.28 s
<<< [<Book: Susan>, <Book: Yasmin>, <Book: Carl>, <Book: Benny>, <Book: George>]