Question

我遇到了将一些CSV文件加载到Postgres表中的问题。我的数据看起来像这样：

ID,IS_ALIVE,BODY_TEXT
123,true,Hi Joe, I am looking for a new vehicle, can you help me out?

现在，这里的问题是，应该是BODY_TEXT列的文本是非结构化的电子邮件数据，可以包含任何类型的字符，当我运行以下COPY命令时，它是失败，因为,中有多个BODY_TEXT个字符。

COPY sent from ('my_file.csv') DELIMITER ',' CSV;

如何解决这个问题，以便BODY_TEXT列中的所有内容按原样加载，而load命令可能会使用其中的字符作为分隔符？

Answer 1

除了修复源文件格式之外，您还可以通过PostgreSQL本身来完成。

将所有行从文件加载到临时表：

create temporary table t (x text);
copy t from 'foo.csv';

然后你可以使用regexp分割每个字符串，如：

select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') from t;

                              regexp_matches                               
---------------------------------------------------------------------------
 {123,true,"Hi Joe, I am looking for a new vehicle, can you help me out?"}
 {456,false,"Hello, honey, there is what I want to ask you."}
(2 rows)

您可以使用此查询将数据加载到目标表：

insert into sent(id, is_alive, body_text)
  select x[1], x[2], x[3] 
  from (
    select regexp_matches(x, '^([0-9]+),(true|false),(.*)$') as x 
    from t) t

将CSV数据加载到Postgres时使用哪个分隔符？

1 个答案: