我有一个使用SQLAlchemy的应用程序。这是对象模型的一部分:
class HtmlTaggingDataset(Base):
__tablename__ = "htmlTaggingDatasets"
id = Column(Integer, primary_key=True)
name = Column(String)
class TrainingHtml(Base):
__tablename__ = "trainingHtml"
id = Column(Integer, primary_key=True)
datasetId = Column(Integer, ForeignKey("htmlTaggingDatasets.id"))
processingState = Column(Integer)
originalSrc = Column(String)
taggedSrc = Column(String)
dataset = relationship("HtmlTaggingDataset", backref=backref("items", order_by=id))
originalSrc
和taggedSrc
是相当长的字符串,最多可达几MB。
现在,我想根据processingState
计算实例数。
如果我这样做:
html_datasets = cherrypy.request.db.query(HtmlTaggingDataset).all()
for ds in html_datasets:
ds.total = len(ds.items)
ds.remaining = len([i for i in ds.items if i.processingState == PROCESSING_STATE_NEW])
ds.in_progress = len([i for i in ds.items if i.processingState == PROCESSING_STATE_IN_PROGRESS])
ds.flagged = len([i for i in ds.items if i.processingState == PROCESSING_STATE_FLAGGED])
然后内存使用量急剧增加。但如果我这样做:
html_datasets = cherrypy.request.db.query(HtmlTaggingDataset).all()
for ds in html_datasets:
items = cherrypy.request.db.query(TrainingHtml).filter_by(datasetId=ds.id)
ds.total = items.count()
ds.remaining = items.filter_by(processingState=PROCESSING_STATE_NEW).count()
ds.in_progress = items.filter_by(processingState=PROCESSING_STATE_IN_PROGRESS).count()
ds.flagged = items.filter_by(processingState=PROCESSING_STATE_FLAGGED).count()
内存使用量保持相对较低。
我的问题: