来自CSV的PostgreSQL COPY带分隔符“|”

时间:2013-04-10 13:41:41

标签: postgresql

拜托,有谁可以帮我解决这个问题?

我想在Postgres数据库中创建一个表,其中CSV文件带有分隔符“|”,同时尝试使用命令COPY(或导入)我收到此错误:

ERROR:  extra data after last expected column
CONTEXT:  COPY twitter, line 2: ""Sono da Via Martignacco 
http://t.co/NUC6MP0z|"<a href=""http://foursquare.com"" rel=""nofollow"">f..."

CSV的前两行:

txt|"source"|"ulang"|"coords"|"tweettime_wtz"|"country"|"id"|"userid"|"in_reply_user_id"|"in_reply_status_id"|"uname"|"ucreationdate"|"utimezone"|"followers_count"|"friends_count"|"x_coords"|"y_coords"
Sono da Via Martignacco http://t.co/NUC6MP0z|"<a href=""http://foursquare.com"" rel=""nofollow"">foursquare</a>"|"it"|"0101000020E6100000191CA9E7726F2A4026C1E1269F094740"|"2012-05-13 10:00:45+02"|112|201582743333777411|35445264|""|""|"toffo93"|"2009-04-26 11:00:03"|"Rome"|1044|198|13.21767353|46.07516943

对于这个数据,我在Postgres创建了一个表“Twitter”

CREATE TABLE public.twitter
(
  txt character varying(255),
  source character varying(255),
  ulang character varying(255),
  coords geometry(Point,4326),
  tweettime_wtz character varying(255),
  country integer,
  userid integer NOT NULL,
  in_reply_user_id character varying(255),
  in_reply_status_id character varying(255),
  uname character varying(255),
  ucreationdate character varying(255),
  utimezone character varying(255),
  followers_count integer,
  friends_count integer,
  x_coords numeric,
  y_coords numeric,
  CONSTRAINT id PRIMARY KEY (userid)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.twitter
  OWNER TO postgres;

任何想法,伙计们?

2 个答案:

答案 0 :(得分:0)

目标表包含16列,但您的文件包含17列。

似乎缺少id字段。

尝试将表格设置为:

CREATE TABLE public.twitter
(
  txt character varying(255),
  source character varying(255),
  ulang character varying(255),
  coords geometry(Point,4326),
  tweettime_wtz character varying(255),
  country integer,
id character varying,
  userid integer NOT NULL,
  in_reply_user_id character varying(255),
  in_reply_status_id character varying(255),
  uname character varying(255),
  ucreationdate character varying(255),
  utimezone character varying(255),
  followers_count integer,
  friends_count integer,
  x_coords numeric,
  y_coords numeric,
  CONSTRAINT twitter_pk PRIMARY KEY (userid)
)
WITH (
  OIDS=FALSE
);

根据需要更改id字段的数据类型。

答案 1 :(得分:0)

我的解决方案:

所以问题发生在我的CSV文件中:它有无形的引号。我在Excel中打开CSV时没有看到它们,我用这种方式看到了这些行:

txt|"source"|"ulang"|"coords"|"tweettime_wtz"|"country"|"id"|"userid"|"in_reply_user_id"|"in_reply_status_id"|"uname"|"ucreationdate"|"utimezone"|"followers_count"|"friends_count"|"x_coords"|"y_coords"
Sono da Via Martignacco http://t.co/NUC6MP0z|"<a href=""http://foursquare.com"" rel=""nofollow"">foursquare</a>"|"it"|"0101000020E6100000191CA9E7726F2A4026C1E1269F094740"|"2012-05-13 10:00:45+02"|112|201582743333777411|35445264|""|""|"toffo93"|"2009-04-26 11:00:03"|"Rome"|1044|198|13.21767353|46.07516943

但是当我在记事本中打开CSV时,我看到的却不一样了:

"txt"|"source"|"ulang"|"coords"|"tweettime_wtz"|"country"|"id"|"userid"|"in_reply_user_id"|"in_reply_status_id"|"uname"|"ucreationdate"|"utimezone"|"followers_count"|"friends_count"|"x_coords"|"y_coords"
"Sono da Via Martignacco http://t.co/NUC6MP0z"|"<a href=""http://foursquare.com"" rel=""nofollow"">foursquare</a>"|"it"|"0101000020E6100000191CA9E7726F2A4026C1E1269F094740"|"2012-05-13 10:00:45+02"|112|201582743333777411|35445264|""|""|"toffo93"|"2009-04-26 11:00:03"|"Rome"|1044|198|13.21767353|46.07516943
"

所以我应该删除所有引号(在记事本中并将文件保存为CSV),以便文本变为:

txt|source|ulang|coords|tweettime_wtz|country|id|userid|in_reply_user_id|in_reply_status_id|uname|ucreationdate|utimezone|followers_count|friends_count|x_coords|y_coords
Sono da Via Martignacco http://t.co/NUC6MP0z|<a href=http://foursquare.com rel=nofollow>foursquare</a>|it|0101000020E6100000191CA9E7726F2A4026C1E1269F094740|2012-05-13 10:00:45+02|112|201582743333777411|35445264|||toffo93|2009-04-26 11:00:03|Rome|1044|198|13.21767353|46.07516943

只有在此之后我才能在pgAdmin中使用导入工具而没有任何问题!