Question

我有一个 Python 代码，用于将从 dataframe 文件创建的 csv 数据插入到 Postgres。我的数据没有被编辑，所以我在逐行插入数据库时检查 errors。

我经常收到错误，我的 value is too long for type VARCHAR(15) 等等。没关系，但是当我收到错误时，我的数据根本没有插入。我想让代码 continue 在发生 error 时插入数据而不是完全停止...


def df2db(conn: psycopg2.extensions.connection, df: pd.DataFrame, table: str):
    columns = [col for col in df.columns]
    buf = StringIO()
    df.to_csv(buf, sep='\t', na_rep='\\N', index=False, header=False)
    buf.seek(0)
    cursor = conn.cursor()
    for row in buf:
        row = row.replace("\n", "").split("\t")
        row = ["'" + val + "'" for val in row]
        try:
            cursor.execute(f"INSERT INTO {table} ({','.join(columns)}) VALUES({','.join(row)}) ON CONFLICT DO NOTHING")
        except psycopg2.Error:
            conn.rollback()
            continue # here it continues, but my data are not inserted
    cursor.close()

顺便说一句。我有大约 2000 万条记录，我无法进行昂贵的处理

Answer 1

无需“昂贵”的处理。只需根据数据库架构将字符串剪切为其字段长度 - 无论是在 Python 端，还是在查询中使用函数。

但我会做不同的事情：使用 pg 工具或 pgAdmin 或 COPY sql 语句将 CSV 原样加载到临时表中，因为这将非常快，然后执行一个将数据复制过来的查询, 将字符串切割到最大长度。

见this q&a for more details。

将数据插入到 Postgres SQL 时处理错误

1 个答案: