我有一个使用python和sqlalchemy填充数据库表的函数。该功能现在运行相当缓慢,大约需要17分钟。我认为主要的问题是我循环遍历两组大数据来构建新表。我在下面的代码中包含了记录计数。
我怎样才能加快速度?我应该尝试将嵌套的for循环转换为一个大的sqlalchemy查询吗?我用pycharm描述了这个函数,但我不确定我是否完全理解结果。
def populate(self):
"""Core function to populate positions."""
# get raw annotations with tag Org
# returns 11,659 records
organizations = model.session.query(model.Annotation) \
.filter(model.Annotation.tag == 'Org')\
.filter(model.Annotation.organization_id.isnot(None)).all()
# get raw annotations with tags Support or Oppose
# returns 2,947 records
annotations = model.session.query(model.Annotation) \
.filter((model.Annotation.tag == 'Support') | (model.Annotation.tag == 'Oppose')).all()
for org in organizations:
for anno in annotations:
# Org overlaps with Support or Oppose tag
# start and end columns are integers
if org.start >= anno.start and org.end <= anno.end:
position = model.Position()
# set to de-duplicated organization
position.organization_id = org.organization_id
position.disposition = anno.tag
# look up bill_id from document_bill table
document = model.session.query(model.document_bill)\
.filter_by(document_id=anno.document_id).first()
position.bill_id = document.bill_id
position.document_id = anno.document_id
model.session.add(position)
logging.info('org: {}, disposition: {}, bill: {}'.format(
position.organization_id, position.disposition, position.bill_id)
)
continue
logging.info('committing to database')
model.session.commit()
答案 0 :(得分:0)
我的投注,按概率递减的顺序: