Python:循环缓慢

时间:2021-03-03 17:39:47

标签: python json api oracle-sqldeveloper atom-editor

执行脚本所花的时间比预期的要多得多。对于仅 1250 条记录,循环并插入表需要 20 多分钟。 请告诉我们这是否正常

以下是从 API(JSON) 中提取的 11 列并将每一行加载到表中 (oracle)。

脚本:

Button

有没有办法使用索引或任何建议。

auth_values = (user, passwd)
response = requests.get(url, auth=auth_values)
json_data = json.loads(response.text)

for data in json_data['result']:
        branchFullName = data['full_name']
        branchNum = data['u_branch_id']
        branchName = data['u_branch_name']
        sysId= data['sys_id']
        sys_updated_on = data['sys_updated_on']
        sys_created_on = data['sys_created_on']
        cursor.execute("INSERT INTO "+PrestageTable+"(BRANCH_FULL_NAME, 
        BRANCH_NUM, BRANCH_NAME,SYS_ID,SYS_CREATED_ON,SYS_UPDATED_ON) VALUES 
        (:1, :2, :3, :4, :5, :6)", 
    (branchFullName,branchNum,branchName,sysId,sys_updated_on,sys_created_on))
        con.commit()

添加了 JSON 文件。

Updated:
    insert_data = []
    for data in json_data['result']:
        branchFullName = data['full_name']
        branchNum = data['u_branch_id']
        branchName = data['u_branch_name']
        sysId= data['sys_id']
        sys_updated_on = data['sys_updated_on']
        sys_created_on = data['sys_created_on']
        insert_data.append(
        (branchFullName, branchNum, branchName, sysId, sys_updated_on, sys_created_on)
        )

        args_str = ','.join(cur.mogrify("(%s,%s,%s,%s,%s,%s)", x) for x in insert_data)
    cursor.execute(f"INSERT INTO {PrestageTable} VALUES " + args_str)
    con.commit()

4 个答案:

答案 0 :(得分:1)

正如我所说,您应该一次插入多行并执行一次。

试试这个:

insert_data = []
for data in json_data['result']:
    ... # branchFullName, branchNum, etc. variables
    inser_data.append(
        (branchFullName, branchNum, branchName, sysId, sys_updated_on, sys_created_on)
    )

args_str = ','.join(cursor.mogrify("(%s,%s,%s,%s,%s,%s)", x) for x in insert_data)
cursor.execute(f"INSERT INTO {PrestageTable} VALUES " + args_str) 
con.commit()

请注意,execute 在循环之外。

答案 1 :(得分:1)

使用 cursor.executemany() 一次性插入所有行。这要求您为所有行创建一个二维参数列表。

params = []
for data in json_data['result']:
    branchFullName = data['full_name']
    branchNum = data['u_branch_id']
    branchName = data['u_branch_name']
    sysId= data['sys_id']
    sys_updated_on = data['sys_updated_on']
    sys_created_on = data['sys_created_on']
    params.append((branchFullName,branchNum,branchName,sysId,sys_updated_on,sys_created_on)

cursor.executemany("""INSERT INTO "+PrestageTable+"(BRANCH_FULL_NAME, 
        BRANCH_NUM, BRANCH_NAME,SYS_ID,SYS_CREATED_ON,SYS_UPDATED_ON) VALUES 
        (:1, :2, :3, :4, :5, :6)""", 
     params)
con.commit()

答案 2 :(得分:0)

此代码中缺少某些内容。这个循环不会花时间执行,所以问题在于检索数据,或者插入到oracle中。 首先我建议确定问题出在哪里,像perf_tool这样的分析工具可以帮助你很多。很难想象这里出了什么问题,但我认为经过一些检查你会发现问题在于写入数据库,因此解决方案可能是进行批量插入或处理索引。

答案 3 :(得分:0)

我按照以下进行了更改,现在运行速度更快了。 代码:


Updated:
    insert_data = []
    for data in json_data['result']:
        branchFullName = data['full_name']
        branchNum = data['u_branch_id']
        branchName = data['u_branch_name']
        sysId= data['sys_id']
        sys_updated_on = data['sys_updated_on']
        sys_created_on = data['sys_created_on']
        insert_data.append(
        (branchFullName, branchNum, branchName, sysId, sys_updated_on, sys_created_on)
        )

 
    cursor.executemany(f"INSERT INTO {PrestageTable} VALUES " + insert_data)
    con.commit()```