当超过256个字符时,如何将R数据帧保存到AWS红移?

时间:2015-12-15 21:00:42

标签: r postgresql dplyr amazon-redshift

我正在尝试使用dplyr的src_postgres函数建立的postgres连接将数据框保存到AWS红移数据库。如下所示,数据框中有一列超过256个字符(有些甚至更多)。当我尝试将此数据帧保存为redshift时,当我使用dplyr的copy_to函数时,会出现以下错误。无论如何我可以增加字符数的限制,这样我就可以将这个数据框保存到AWS红移上,或者其他人对如何将我的数据框保存到红移有任何建议吗?谢谢。

> nchar(df$text)
[1] 598

> copy_to(conn_dplyr, df, TableName, temporary = FALSE)
Error in postgresqlExecStatement(conn, statement, ...) : 
RS-DBI driver: (could not Retrieve the result : ERROR:  value too long for    type character varying(256)
)

2 个答案:

答案 0 :(得分:0)

这是因为Redshift不支持Text数据类型。当您将任何列声明为Text时,Redshift会将其内部存储为Varchar(255)。 相反,将列/变量更改为varchar(1000)(根据传入的预期值,长度达到最大值。)

答案 1 :(得分:0)

I have had a very similar issue recently and found some sort of work around, not very elegant but it worked

 Callback<TableColumn<MyClass, Object>, TableCell<MyClass, Object>> callback = new Callback<TableColumn<MyClass, Object>, TableCell<MyClass, Object>>() {
         @Override
         public TableCell<MyClass, Object> call(TableColumn<MyClass, Object> param) {
            return null;
         }
      };

Then added a simple lookup function:

getColumnClasses <- function(df) {
  return(data.frame(lapply(df[1, ], class)))
}

Finally, you can call rClassToRedshiftType <- function(class) { switch(class, factor = { return('VARCHAR(256)') }, character = { return('VARCHAR(65535)') }, logical = { return('boolean') }, numeric = { return('float') }, integer = { return('int') } ) return('timestamp') } getRedshiftTypesForDataFrame <- function(df) { return( apply( getColumnClasses(df), 2, FUN = rClassToRedshiftType ) ) } using the parameter copy_to

types

Obviously, if you know the columns in advance you can define the dplyr::copy_to( connection, df, table.name, temporary = FALSE, types = getRedshiftTypesForDataFrame(df) ) vector manually