我有一个烧瓶应用程序,其中包含项目,文章和标签。
Abridged models.py是:
project_articles = Table('project_articles',
Base.metadata,
Column('project_id', Integer, ForeignKey('project.id')),
Column('article_id', Integer, ForeignKey('article.id'))
)
article_tags = Table('article_tags',
Base.metadata,
Column('tag_id', Integer, ForeignKey('tag.id')),
Column('article_id', Integer, ForeignKey('article.id'))
)
class Project(Base):
__tablename__ = 'project'
id = Column(Integer, primary_key=True)
articles = relationship('Article', secondary=project_articles, backref='project', lazy='dynamic')
tags = association_proxy('articles', 'tags')
class Article(Base):
__tablename__ = 'article'
id = Column(Integer, primary_key=True)
projects = relationship('Project', secondary=project_articles, backref='article')
tags = relationship('Tag', secondary=article_tags, backref='article')
date_created = Column(DateTime, default=datetime.now, nullable=False)
class Tag(Base):
__tablename__ = 'tag'
id = Column(Integer, primary_key=True)
articles = relationship('Article', secondary=article_tags, backref='tag')
text = Column(String)
我经常进行查询,返回与在某个日期范围内创建的项目相关的所有文章:
q = db.session.query(Article)
q = q.join(Article.project)
q = q.filter(Project.id == id)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
articles = q.all()
我还想找到与上述文章子集相关的所有标签,但我需要知道每个标签出现的次数(相同的标签可能与多篇文章相关联)。我目前使用的是python:
tags = [tag for article in articles for tag in article.tags]
但这很慢,我确信这是一个sqlalchemy查询,可以做到这一点。
注意我可以这样做:
q = db.session.query(Tag)
q = q.join(Tag.article)
q = q.join(Article.project)
q = q.filter(Project.id == 2)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
tags = q.all()
但这是过滤Tag表,所以只给我一个唯一的列表,但我需要知道每个标签的出现频率。
感谢。
答案 0 :(得分:2)
实际上,当您收集查询返回的文章的所有Tags
时,会为每篇文章发布单独的SQL,这可能会很慢。
选项-1:一种方法是使用以下方法在原始查询期间急切加载所有标记:
joinedload
,在这种情况下,原始查询还会预取标签subqueryload
,在这种情况下,只要访问第一个Article.tag
属性,就会再发出一个查询,该查询将加载所有使用原始查询加载的文章的所有标记。在这种情况下,您可以使用您的代码,只需添加一个选项:
q = db.session.query(Article)
q = q.join(Article.project)
q = q.filter(Project.id == id)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
#q = q.options(joinedload(Article.tags)) # @new: load Tag immediatelly
q = q.options(subqueryload(Article.tags)) # @new: load Tag on first access (in the line where tags are collected)
articles = q.all()
并且您的代码集合代码保持不变:
tags = [tag for article in articles for tag in article.tags]
选项-2 :另一种方法是拥有一个单独的查询,就像您实际上在第二个代码段中尝试的那样。您没有在查询中获得重复项的原因是因为sqlalchemy实际上是在ORM级别上过滤重复项。要解决此问题,您可以在查询中添加一个计数器:
q = db.session.query(Tag, func.count('*').label("cnt")) #@new: added COUNT
q = q.join(Tag.article)
q = q.join(Article.project)
q = q.filter(Project.id == project_id)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
q = q.group_by(Tag) #@new:
tags = q.all()
return tags # @note: the result is a list of tuples: (Tag, cnt)
另一个技巧是告诉sqlalchemy只返回一些(Tag)列,而不是ORM对象(Tag),在这种情况下,sqlalchemy将返回所有行而不返回重复:
q = db.session.query(Tag.text) # @new:modified
q = q.join(Tag.article)
q = q.join(Article.project)
q = q.filter(Project.id == project_id)
q = q.filter(Article.date_created.between(now-timedelta(hours=1), now))
tags = q.all()
return tags # @note: the result is a list of tuples: (tag_name,)