AWS Data Pipeline将日志数据从S3复制到Redshift

时间:2016-03-25 10:42:24

标签: amazon-web-services amazon-s3 amazon-redshift amazon-data-pipeline

我已设置AWS Data Pipeline将一些CSV日志数据从S3导入Redshift群集。

我的Redshift数据库表具有以下结构:

CREATE TABLE access_log
(
  id bigint identity(1, 1),
  host character varying(64),
  cf_host character varying(64),
  xff_host character varying(64),
  event_time timestamp,
  method character varying(16),
  url text,
  response_code integer,
  referer text,
  user_agent text,
  device_id character varying(40),
  primary key(id)
)
sortkey(id);

以下是我的CSV日志数据的摘录:

" 172.20.2.224"," null"," null"," 2016-03-16 00:01:28" " GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:01:33",&# 34; GET"" /"" 200""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:11:28",&# 34; GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:11:33",&# 34; GET"" /"" 200""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:21:28",&# 34; GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:21:33",&# 34; GET"" /"" 200""空""空"

如果我使用以下复制命令,从SQLWorkbenchJ开始一切正常:

copy access_log
from 's3://mylogrepo' 
credentials
'aws_access_key_id=myaccesskey;aws_secret_access_key=myaccesskeysecret'
DELIMITER ','
REMOVEQUOTES
TIMEFORMAT 'YYYY-MM-DD HH:MI:SS'

但是当Redshift复制活动运行时,我收到以下错误:

[Amazon](500310) Invalid operation: cannot set an identity column to a value;

我觉得有趣的是错误堆栈跟踪中的这一行:

  

private.com.amazonaws.services.datapipeline.redshift.QueryStatementException:   异常Amazon无效操作:无法设置标识   列到一个值;执行START TRANSACTION时;插入   public.access_log SELECT s。* FROM staging s LEFT JOIN   public.access_log t ON s。" id" = t。" id"在哪里。" id"一片空白;承诺;   在   private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatement。(RedshiftQueryStatement.java:43)   在   private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatementFactory.newQueryStatement(RedshiftQueryStatementFactory.java:9)   在   ...   private.com.amazonaws.services.datapipeline.redshift.SqlHelper.prepareStatement(SqlHelper.java:84)   在   $ TaskRunner.run(HeartbeatingTaskRunner.java:34)... 1更多引起:   java.sql.SQLException:Amazon无效操作:无法设置   标识列的值;在   com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(未知   源)

我的CSV数据中的IP是否可能被解释为列ID?

谢谢!

1 个答案:

答案 0 :(得分:0)

我怀疑你的s3列映射无效。你能分享你的列映射吗?