Question

我编写了这个脚本，将TSV文件夹的内容上载到我的Postgres数据库中。

它可以工作，但是会逐行读取文件，这需要很长时间。

是否可以修改它，使其运行\ COPY命令而不是INSERT命令？

我在下面的代码中放弃了上一次\ COPY的尝试（但被注释掉了）。该代码的问题在于它将文件头复制到了Postgres表的行中。

def main():

# MAKE SURE THIS IS THE RIGHT FILE TYPE
for file in pathlib.Path().rglob('*.tsv'):
    print(os.path.abspath(file))

    # MAKE SURE THIS IS THE RIGHT TABLE
    cur.execute(create_table_agent)


    with open(file,'r') as file_in:
        reader = csv.reader(file_in, delimiter='\t')
        next(reader)
        for row in reader:
            print(row)
            cur.execute("INSERT INTO mls_agent_1_line VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)", row)
        # cur.copy_from(file_in, 'mls_appraisal_world', sep='\t', null='\\N')
    conn.commit()

conn.close()

if __name__ == '__main__':
    main()

Answer 1

Postgres COPY命令只能正确跳过CSV格式的标题。每个the documentation:

HEADER

指定文件包含标题行，其中包含文件中每一列的名称。输出时，第一行包含表中的列名，输入时，第一行被忽略。仅当使用CSV格式时，才允许使用此选项。

如果使用format csv选项通过COPY命令可以正确导入文件，请使用函数copy_expert(sql, file, size=8192)：

with open(file, 'r') as file_in:
    cur.copy_expert("copy table_name from stdin with csv header delimiter E'\t'", file_in)

使用\ copy命令将TSV批量复制到Postgres

1 个答案: