SQLAlchemy查询查找由父级和祖父级筛选的孙子孙

时间:2014-01-02 11:33:36

标签: python sqlalchemy flask

我有一个烧瓶应用程序,其中包含项目,文章和标签。

Abridged models.py是:

project_articles = Table('project_articles',
    Base.metadata,
    Column('project_id', Integer, ForeignKey('project.id')),
    Column('article_id', Integer, ForeignKey('article.id'))
    )

article_tags = Table('article_tags',
    Base.metadata,
    Column('tag_id', Integer, ForeignKey('tag.id')),
    Column('article_id', Integer, ForeignKey('article.id'))
    )

class Project(Base):
    __tablename__ = 'project'
    id = Column(Integer, primary_key=True)
    articles = relationship('Article', secondary=project_articles, backref='project', lazy='dynamic')
    tags = association_proxy('articles', 'tags')

class Article(Base):
    __tablename__ = 'article'
    id = Column(Integer, primary_key=True)
    projects = relationship('Project', secondary=project_articles, backref='article')
    tags = relationship('Tag', secondary=article_tags, backref='article')
    date_created = Column(DateTime, default=datetime.now, nullable=False)

class Tag(Base):
    __tablename__ = 'tag'
    id = Column(Integer, primary_key=True)
    articles = relationship('Article', secondary=article_tags, backref='tag')
    text = Column(String)

我经常进行查询,返回与在某个日期范围内创建的项目相关的所有文章:

q = db.session.query(Article)
q = q.join(Article.project)
q = q.filter(Project.id == id)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
articles = q.all()

我还想找到与上述文章子集相关的所有标签,但我需要知道每个标签出现的次数(相同的标签可能与多篇文章相关联)。我目前使用的是python:

tags = [tag for article in articles for tag in article.tags]

但这很慢,我确信这是一个sqlalchemy查询,可以做到这一点。

注意我可以这样做:

q = db.session.query(Tag)
q = q.join(Tag.article)
q = q.join(Article.project)
q = q.filter(Project.id == 2)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
tags = q.all()

但这是过滤Tag表,所以只给我一个唯一的列表,但我需要知道每个标签的出现频率。

感谢。

1 个答案:

答案 0 :(得分:2)

实际上,当您收集查询返回的文章的所有Tags时,会为每篇文章发布单独的SQL,这可能会很慢。

选项-1:一种方法是使用以下方法在原始查询期间急切加载所有标记:

  1. joinedload,在这种情况下,原始查询还会预取标签
  2. subqueryload,在这种情况下,只要访问第一个Article.tag属性,就会再发出一个查询,该查询将加载所有使用原始查询加载的文章的所有标记。
  3. 在这种情况下,您可以使用您的代码,只需添加一个选项:

    q = db.session.query(Article)
    q = q.join(Article.project)
    q = q.filter(Project.id == id)
    q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
    #q = q.options(joinedload(Article.tags)) # @new: load Tag immediatelly
    q = q.options(subqueryload(Article.tags)) # @new: load Tag on first access (in the line where tags are collected)
    articles = q.all()
    

    并且您的代码集合代码保持不变:

    tags = [tag for article in articles for tag in article.tags]
    

    选项-2 :另一种方法是拥有一个单独的查询,就像您实际上在第二个代码段中尝试的那样。您没有在查询中获得重复项的原因是因为sqlalchemy实际上是在ORM级别上过滤重复项。要解决此问题,您可以在查询中添加一个计数器:

    q = db.session.query(Tag, func.count('*').label("cnt")) #@new: added COUNT
    q = q.join(Tag.article)
    q = q.join(Article.project)
    q = q.filter(Project.id == project_id)
    q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
    q = q.group_by(Tag) #@new:
    tags = q.all()
    return tags # @note: the result is a list of tuples: (Tag, cnt)
    

    另一个技巧是告诉sqlalchemy只返回一些(Tag)列,而不是ORM对象(Tag),在这种情况下,sqlalchemy将返回所有行而不返回重复:

    q = db.session.query(Tag.text) # @new:modified
    q = q.join(Tag.article)
    q = q.join(Article.project)
    q = q.filter(Project.id == project_id)
    q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
    tags = q.all()
    return tags # @note: the result is a list of tuples: (tag_name,)