Question

我有一个使用python和sqlalchemy填充数据库表的函数。该功能现在运行相当缓慢，大约需要17分钟。我认为主要的问题是我循环遍历两组大数据来构建新表。我在下面的代码中包含了记录计数。

我怎样才能加快速度？我应该尝试将嵌套的for循环转换为一个大的sqlalchemy查询吗？我用pycharm描述了这个函数，但我不确定我是否完全理解结果。

def populate(self):
    """Core function to populate positions."""

    # get raw annotations with tag Org
    # returns 11,659 records
    organizations = model.session.query(model.Annotation) \
        .filter(model.Annotation.tag == 'Org')\
        .filter(model.Annotation.organization_id.isnot(None)).all()

    # get raw annotations with tags Support or Oppose
    # returns 2,947 records
    annotations = model.session.query(model.Annotation) \
        .filter((model.Annotation.tag == 'Support') | (model.Annotation.tag == 'Oppose')).all()

    for org in organizations:
        for anno in annotations:

            # Org overlaps with Support or Oppose tag
            # start and end columns are integers
            if org.start >= anno.start and org.end <= anno.end:
                position = model.Position()
                # set to de-duplicated organization
                position.organization_id = org.organization_id
                position.disposition = anno.tag
                # look up bill_id from document_bill table
                document = model.session.query(model.document_bill)\
                    .filter_by(document_id=anno.document_id).first()
                position.bill_id = document.bill_id
                position.document_id = anno.document_id
                model.session.add(position)
                logging.info('org: {}, disposition: {}, bill: {}'.format(
                    position.organization_id, position.disposition, position.bill_id)
                )
                continue
        logging.info('committing to database')
        model.session.commit()

Answer 1

我的投注，按概率递减的顺序：

自动提交已开启，因此您正在等待磁盘。
循环内的查询＆＃34; document = model.session.query（model.document_bill）....＆＃34;很慢（使用EXPLAIN ANALYZE）。
大部分时间实际上是花在内循环中打印日志到终端（你应该分析）
model.session.add（位置）很慢（不知道那是做什么）
（这个应该真的是第一个）像INSERT INTO SELECT这样的SQL查询可以在几十毫秒内完成吗？如果是这样，为什么要在应用程序中进行循环？...

加速python w / sqlalchemy功能

1 个答案: