Django:优化多对多查询

时间:2011-04-12 14:02:56

标签: python django query-optimization rails-postgresql

我有Post和Tag模型:

class Tag(models.Model):
    """ Tag for blog entry """
    title           = models.CharField(max_length=255, unique=True)

class Post(models.Model):
    """ Blog entry """
    tags            = models.ManyToManyField(Tag)
    title           = models.CharField(max_length=255)
    text            = models.TextField()

我需要为每个帖子输出博客条目列表和一组标签。我希望能够使用此工作流程只使用两个查询来执行此操作:

  1. 获取帖子列表
  2. 获取这些帖子中使用的标签列表
  3. 将标签链接到python中的帖子
  4. 我遇到了最后一步的麻烦,这是我提出的代码,但在给我'Tag' object has no attribute 'post__id'

    #getting posts
    posts = Post.objects.filter(published=True).order_by('-added')[:20]
    #making a disc, like {5:<post>}
    post_list = dict([(obj.id, obj) for obj in posts])
    #gathering ids to list
    id_list = [obj.id for obj in posts]
    
    #tags used in given posts
    objects = Tag.objects.select_related('post').filter(post__id__in=id_list)
    relation_dict = {}
    for obj in objects:
        #Here I get: 'Tag' object has no attribute 'post__id'
        relation_dict.setdefault(obj.post__id, []).append(obj)
    
    for id, related_items in relation_dict.items():
        post_list[id].tags = related_items
    

    你能看到错误吗?如何使用django ORM解决此任务,或者我必须编写自定义SQL?

    编辑:

    我能够通过原始查询来解决这个问题:

    objects = Tag.objects.raw("""
        SELECT
            bpt.post_id,
            t.*
        FROM
            blogs_post_tags AS bpt,
            blogs_tag AS t
        WHERE
            bpt.post_id IN (""" + ','.join(id_list) + """)
            AND t.id = bpt.tag_id
    """)
    relation_dict = {}
    for obj in objects:
        relation_dict.setdefault(obj.post_id, []).append(obj)
    

    如果有人指出如何避免它,我会非常感激。

2 个答案:

答案 0 :(得分:4)

这是我在这种情况下通常做的事情:

posts = Post.objects.filter(...)[:20]

post_id_map = {}
for post in posts:
    post_id_map[post.id] = post
    # Iteration causes the queryset to be evaluated and cached.
    # We can therefore annotate instances, e.g. with a custom `tag_list`.
    # Note: Don't assign to `tags`, because that would result in an update.
    post.tag_list = []

# We'll now need all relations between Post and Tag. 
# The auto-generated model that contains this data is `Post.tags.through`.
for t in Post.tags.through.select_related('tag').filter(post_id__in=post):
    post_id_map[t.post_id].tag_list.append(t.tag)

# Now you can iterate over `posts` again and use `tag_list` instead of `tags`.

如果以某种方式封装此模式会更好,因此您可能希望添加一个QuerySet方法(例如select_tags())来为您执行此操作。

答案 1 :(得分:1)

如果你必须在两个查询中拥有它,我认为你需要自定义SQL:

def custom_query(posts):
  from django.db import connection
  query = """
  SELECT "blogs_post_tags"."post_id", "blogs_tag"."title"
  FROM "blogs_post_tags"
  INNER JOIN "blogs_tags" ON ("blogs_post_tags"."tag_id"="blogs_tags"."id")
  WHERE "blogs_post_tags"."post_id" in %s
  """
  cursor=connection.cursor()
  cursor.execute(query,[posts,])
  results = {}
  for id,title in cursor.fetchall():
    results.setdefault(id,[]).append(title)
  return results

recent_posts = Post.objects.filter(published=True).order_by('-added')[:20]
post_ids = recent_posts.values_list('id',flat=True)
post_tags = custom_query(post_ids)

recent_posts是您的Post QuerySet,应该从一个查询缓存 post_tags是一个查询中的帖子ID到标签标题的映射。