表结构

mysql表sql：

create table orderitems
(
  id                         char(36) collate utf8_bin not null
    primary key,
  store_id                   char(36) collate utf8_bin not null,
  ref_type                   int                       not null,
  ref_id                     char(36) collate utf8_bin not null,
  store_product_id           char(36) collate utf8_bin not null,
  product_id                 char(36) collate utf8_bin not null,
  product_name               varchar(50)               null,
  main_image                 varchar(200)              null,
  price                      int                       not null,
  count                      int                       not null,
  logistics_type             int                       not null,
  time_create                bigint                    not null,
  time_update                bigint                    not null,
  ...
);

我使用相同的sql在redshift中创建表，但是在导入csv时出错。

我的代码将csv导入到redshift（python）

# parquet is dumpy by sqoop
p2 = 'xxx'
df = pd.read_parquet(path)    

with smart_open.smart_open(p2, 'w') as f:
    df.to_csv(f, index=False)  # python3 default encoding is utf-8

conn = psycopg2.connect(CONN_STRING)

sql="""COPY %s FROM '%s' credentials 'aws_iam_role=%s' region 'cn-north-1' 
delimiter ',' FORMAT AS CSV IGNOREHEADER 1 ; commit ;""" %  (to_table, p2, AWS_IAM_ROLE)
print(sql)
cur = conn.cursor()
cur.execute(sql)
conn.close()

错误

通过检查STL_LOAD_ERRORS在product_name列上发现错误

row_field_value：................................................... 215克/ ...
err_code：1204
err_reason：字符串长度超过DDL长度

real_value是伊利畅轻蔓越莓奇亚籽风味发酵乳215g/瓶（中文）。

因此，它看起来像是一些编码问题。由于mysql是utf-8，而csv也是utf-8，所以我不知道这是怎么回事。

Answer 1

您的列是varchar数据类型，长度为50。即50个字节，而不是50个字符。您提供的字符串示例看起来大约有16个中文字符，大概每个3个字节（UTF-8）和4个ASCII字符（每个1个字节），所以大约52个字节。这比列的字节长，所以导入失败。

postgresql（aws redshift）错误1204字符串长度超过DDL长度

表结构

我的代码将csv导入到redshift（python）

错误

1 个答案: