Question

我有数千个csv，每个csv都有10000多个记录。我正在寻找最有效的方法，以最少的时间和精力将数据转储到Postgres DB中的表中。

Answer 1

COPY通常是最好的解决方案。取决于您的约束条件。

COPY table_name FROM 'path_readable_by_postgres/file.cvs';

您可以cat将文件放大到一个大文件，以快速导入数据。

查看ta https://www.postgresql.org/docs/current/static/sql-copy.html以获得更多详细信息。

Answer 2

您可以使用pandas库读取和转换数据（如果需要），使用sqlalchemy创建postgres引擎，使用psycopg2将数据加载到postgresql中。我假设您已经在Postgres DB中创建了表。试试下面的代码

import pandas as pd
from sqlalchemy import create_engine
import pandas as pd
import psycopg2
# Drop "Unnamed: 0", as it often causes problems in writing to table
pd.read_csv({path/to/file.csv}, index_col={index_column}).drop(["Unnamed: 0"], axis=1)
# Now simply load your data into database
engine = create_engine('postgresql://user:password@host:port/database')
try:
    pd_table.to_sql({'name_of_table_in_postgres_db'}, engine, if_exists='append')
except (Exception, psycopg2.DatabaseError) as error:
    print(error)
finally:
    print('Closed connection to the database')

自动将csv转储到新的Postgres表中

2 个答案: