Question

我正在使用csv模块和writerow方法。

注意：这是我尽可能简化的代码。我要求理解。我尽可能地提供了Minimal, Complete, and Verifiable example。

我要做什么：

数据库中的三个表：

MODEL_test-包含将学习算法的数据

my_prediction-包含看不见的数据，将在该数据上应用算法

OUT_predictions-包含算法predict方法的输出

第一步，我创建一个新的CSV文件并保持打开状态，直到完成对当前算法的分配为止。在训练迭代开始之前，我在CSV文件行中附加了来自看不见的表数据的前7个值，因此不会将数据相乘。然后，在每次算法迭代之后，我想用OUT_prediction值附加已打开的文件。

代码：

import csv
import datetime

def export_to_csv():

    ldb = sqlite3.connect('database.db')
    c = ldb.cursor()

    table_name = 'my_predictions'

    training_size = 3

    now = datetime.datetime.now()
    file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))

    export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
                     ['OUTPUT ' + str(n) for n in range(1, training_size + 1)]

    with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow(export_columns)
        output_writer = csv.DictWriter(csv_file, fieldnames=export_columns)

        for o in range(1, 500): # < write all unseen data from database to csv

            c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
            fetch_one = c.fetchone()

            writer.writerow(fetch_one[1:7])

        for t in range(training_size): #for each iteration write output to csv

            # some machine learning training code

            prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions

            combined_set = list(map(str, prediction))

            ids = 1

            for each in combined_set:
                c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
                                     ",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])

                ids += 1

            ldb.commit()

            for o in range(1, 500): # <-- write down output from last prediction iteration to specific column
                c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
                fetch_output = c.fetchone()

                output_writer.writeheader()
                output_writer.writerow({'OUTPUT ' + str(t + 1): fetch_output[-1]})  # <-- columns remain empty

问题是什么

代码完成并打开文件后，我可以看到OUTPUT列保持为空

CSV IMAGE

编辑：我不想使用pandas和to_csv，因为您的速度非常慢。有时我看不见的数据有100万行，使用to_csv进行一次迭代需要半小时。

Answer 1

我知道我做错了什么，并且找到了解决这种情况的方法，但是我对此不满意。当我尝试以w模式添加新列时，新数据总是写在文件末尾。当我设置csv_file.seek(0)时，旧数据将被覆盖。

我也尝试过以r+模式重新打开文件并设置csv_file.seek(0)，但是得到了相同的结果。

我将使用xlwings来完成此任务，因为它可以给我更多控制权，但仍然不知道它将如何影响输入数据速度。我的目标是准备摘要报告，其中包含看不见的数据，每次迭代的输出和统计信息。

解决方案（带有r+）：

now = datetime.datetime.now()
file_name = str.format('my_predictions {}', now.strftime("%Y-%m-%d %H %M %S"))

export_columns = ['COLUMN ' + str(n) for n in range(1, 8)] + \
                 ['OUTPUT ' + str(n) for n in range(1, training_size + 1)]


with open('archived/' + file_name + '.csv', 'w', newline='') as csv_file:

    writer = csv.writer(csv_file)
    writer.writerow(export_columns)

    for o in range(1, 500):

        c.execute(str.format('SELECT * FROM {} WHERE ID=?', table_name), [o])
        fetch_one = c.fetchone()

        writer.writerow(fetch_one[1:7])

for t in range(training_size):

    # some machine learning training code

    prediction = [0, 0, 1, 1, 0, 1] # <-- sample output from predictions

    combined_set = List(Map(Str, prediction))

    # ids = 1
    #
    # for each in combined_set:
    #    c.execute(str.format('INSERT INTO OUTPUT_prediction VALUES ({})',
    #                         ",".join(["?" for _ in range(1, len([ids] + [int(each)]) + 1)])), [ids] + [int(each)])
    #
    #    ids += 1
    #
    # ldb.commit()

    with open('archived/' + file_name + '.csv', 'r+', newline='') as csv_file:

        writer = csv.writer(csv_file)
        csv_input = csv.reader(csv_file)
        rows = List(csv_input)
        writer.writerow(export_columns)

        for row, o in zip(rows, combined_set):

            row += [o]

            writer.writerow(row)

Python CSV writerow到已打开文件中的特定列

1 个答案: