Question

我在Postgres中有下表：

   Column   |            Type             | Modifiers 
------------+-----------------------------+-----------
 customer   | text                        | 
 feature    | character varying(255)      | 
 values     | character varying[]         | 
 updated_ts | timestamp without time zone |

我正在尝试编写以下pandas DataFrame

    customer     feature                       values           updated_ts
0     A             B                       [red, black]     2019-01-15 00:00:00 
1     A             B                       [blue, green]    2019-01-16 00:00:00

使用以下代码：

import psycopg2
...    
sio = BytesIO()
sio.write(df.to_csv(header=False, index=False, sep='\t', quoting=csv.QUOTE_NONE))
sio.seek(0)
with connection.cursor() as cursor: 
    cursor.copy_from(file=sio, table=table, columns=df.columns, sep='\t', null='')
    connection.commit()

但是我遇到以下错误：

DataError（'格式不正确的数组文字：“ [\'red \'，\'black \']” \ n详细信息： “ [”必须引入明确指定的数组尺寸。\ nCONTEXT： COPY test_features_values，第1行，列值：“ [\'red \'， \'black \']“ \ n'，）

如何正确书写？

Answer 1

我认为您需要将列表转换为集合：

df['values'] = df['values'].apply(set)

使插入起作用。原因是PostgreSQL expects arrays to be inserted using brace ({}) notation，而不是括号（[]）表示法。当您从列表转换为集合时，to_csv方法使用大括号以PostgreSQL期望的相同配置来表示集合（这是令人惊喜的；我已经看到了其他表示形式，但最终变得更加骇人听闻）转换）。

我要注意的另一件事是，为了使其正常工作，我不得不从BytesIO切换到StringIO，因为df.to_csv(...）不是一个字节-像对象。

进行这些更改后，插入成功：

import csv
import pandas
import psycopg2
from io import StringIO 

# initialize connection
connection = psycopg2.connect('postgresql://scott:tiger@localhost:5432/mydatabase')

# create data
df = pandas.DataFrame({
    'customer': ['A', 'A'],
    'feature': ['B', 'B'],
    'values': [['red', 'black'], ['blue', 'green']],
    'updated_ts': ['2019-01-15 00:00:00', '2019-01-16 00:00:00']
})
# cast list to set
df['values'] = df['values'].apply(set)

# write data to postgres
sio = StringIO()
sio.write(df.to_csv(header=False, index=False, sep='\t', quoting=csv.QUOTE_NONE))
sio.seek(0)
with connection.cursor() as cursor: 
    cursor.copy_from(file=sio, table='test', columns=df.columns, sep='\t', null='')
    connection.commit()

使用python的psycopg2将数组写入Postgres

1 个答案: