交易回滚

时间:2019-11-27 09:40:26

标签: python mysql transactions rollback

我有一个大列表,它本身由53,000,000个作为元素的较小列表组成。我想将这些较小的列表中的每一个作为行连续提交给db,批处理大小为1,000,000,这意味着每次脚本连接到db时,它都会提交1000,000个元素,然后断开与db的连接,然后再次连接以提交另外1,000,000行。

现在我的问题是,如果在中间发生错误,例如在提交50,000,000行之后,例如,我需要删除数据库中的所有行,并尝试从头开始提交所有内容。

我当时想也许我可以使用rollback()来删除目前已添加的所有50,000,000行,但是只要我使用循环,就不知道如何回滚所有已提交的50,000,000行分批

有人建议吗?

这是我的脚本: “结果”是将53,000,000个较小的列表作为元素的列表。

batch = []
counter = 0
BATCH_SIZE =1000000
cursor_count = 0

def prepare_names(names):
    return [w.replace("'", '') for w in names]

for i in range(len(results)):
    if counter < BATCH_SIZE:
        batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))  # batch => [[ACC1234.0, 'Some full taxa name'], ...]
        counter += 1
    else:
        batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))

        values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
        sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"

        try:
            cursor.execute(sql)
            db.commit()
        except Exception as exception:
            print(exception)
            print(f"Problem with query: {sql}")

        print(cursor.rowcount, "Records Inserted")
        cursor_count += cursor.rowcount
        counter = 0
        batch = []
else:
    if batch:
        values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
        sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"

        try:
            cursor.execute(sql)
            db.commit()
        except Exception as exception:
            print(exception)
            print(f"Problem with query: {sql}")

        print(cursor.rowcount, "Records Inserted")
        cursor_count += cursor.rowcount

print("Total Number Of %s Rows Has Been Added." %(cursor_count))
db.close()

2 个答案:

答案 0 :(得分:0)

commit之后没有回滚。

考虑一下:

1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error

您只能回滚第三次尝试。完成第一和第二。

解决方法 修改您的accession_taxonomy表并添加一个名为insertHash的字段。您的批处理更新过程在此字段-的此批处理执行中将具有唯一值。假设todaysDate -如果插入步骤失败,则可以

Delete T from accession_taxonomy T Where T.insertHash ='TheValueUSet'

所以从本质上来说,它变成了这样:

1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
Delete AllRows where insertHash = 'TheValueUSet'

话虽如此,您确定要拍摄100万行吗?您是否检查过您的服务器是否能够接受该大数据包?

答案 1 :(得分:0)

我会使用一些标志来确保

  • 插入了一些东西
  • 没事了

然后,使用这些标志来选择提交或回滚,例如:

nothing_wrong_happened = True
something_was_inserted = False

for i in range(len(results)):

    # Your code that generates the query

        try:
            cursor.execute(sql)
            something_was_inserted = True  # <-- you inserted something
        except Exception as exception:
            nothing_wrong_happened = False # <-- Something bad happened
            print(exception)
            print(f"Problem with query: {sql}")

        # the rest of your code
else:

    # Your code that generates the query

        try:
            cursor.execute(sql)
            something_was_inserted = True  # <-- you inserted something
        except Exception as exception:
            nothing_wrong_happened = False # <-- Something bad happened
            print(exception)
            print(f"Problem with query: {sql}")

        # the rest of your code

# The loop is now over
if (something_was_inserted):
    if (nothing_wrong_happened):
        db.commit()   # commit everything
    else:
        db.rollback() # rollback everything