我正在构建一个工具,将csv文件输入到postgres数据库中。但是,我无法处理空值。该错误是由于源csv文件中的int数据类型字段为空。如果可能,我想使用python处理此操作,避免对csv提取进行任何更改。简化的csv格式如下:
我的字段和格式: Field1(Int),Field2(Varchar)
csv快照示例:
“ 1”,“ abc ,, sdas”“ ds,dsd,a” sdasdasda“
“”,“ asdasd,”“ ,,”“ <” <“ //”
我已经看过copy_from和copy_expert选项。但是,copy_from不允许使用引号封装字段,而copy_expert没有空处理。我也尝试用pandas替换空值,但是pandas也不解析带有多引号的字段。
#pandas fail
import pandas as pd
flights = pd.read_csv('sample.csv',sep=',\s*',skipinitialspace=True,quoting=csv.QUOTE_ALL,engine='python')
flights.shape
ParserError: Expected 8 fields in line 6, saw 9. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
import pandas as pd
flights = pd.read_csv('sample.csv',sep=',')
flights.shape
ParserError: Error tokenizing data. C error: Expected 8 fields in line 6, saw 9
#copy_expert fail
import psycopg2
conn = psycopg2.connect(user = "user",
password = "password",
host = "1.1.1.1",
port = "1111",
database = "Test_1")
cur = conn.cursor()
with open('sample.csv', 'r') as f:
cur.copy_expert("""COPY abcd FROM STDIN WITH (FORMAT CSV)""", f)
conn.commit()
DataError: invalid input syntax for integer: ""
#copy_from fail
import psycopg2
conn = psycopg2.connect(user = "user",
password = "password",
host = "1.1.1.1",
port = "1111",
database = "Test_1")
cur = conn.cursor()
with open('sample.csv', 'r') as f:
cur.copy_from(f, 'abcd', sep=',', null='None')
conn.commit()
DataError: invalid input syntax for integer: ""
My expectation is for postgres to accept and update below:
Field1 (Int),Field2 (Varchar)
1,abc,,sdas""ds,dsd,a"sdasdasda
,asdasd,"",,,"""<"<"//