我已设置AWS Data Pipeline将一些CSV日志数据从S3导入Redshift群集。
我的Redshift数据库表具有以下结构:
CREATE TABLE access_log
(
id bigint identity(1, 1),
host character varying(64),
cf_host character varying(64),
xff_host character varying(64),
event_time timestamp,
method character varying(16),
url text,
response_code integer,
referer text,
user_agent text,
device_id character varying(40),
primary key(id)
)
sortkey(id);
以下是我的CSV日志数据的摘录:
" 172.20.2.224"," null"," null"," 2016-03-16 00:01:28" " GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:01:33",&# 34; GET"" /"" 200""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:11:28",&# 34; GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:11:33",&# 34; GET"" /"" 200""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:21:28",&# 34; GET"" /"" 302""空""空" " 172.20.2.224"," null"," null"," 2016-03-16 00:21:33",&# 34; GET"" /"" 200""空""空"
如果我使用以下复制命令,从SQLWorkbenchJ开始一切正常:
copy access_log
from 's3://mylogrepo'
credentials
'aws_access_key_id=myaccesskey;aws_secret_access_key=myaccesskeysecret'
DELIMITER ','
REMOVEQUOTES
TIMEFORMAT 'YYYY-MM-DD HH:MI:SS'
但是当Redshift复制活动运行时,我收到以下错误:
[Amazon](500310) Invalid operation: cannot set an identity column to a value;
我觉得有趣的是错误堆栈跟踪中的这一行:
private.com.amazonaws.services.datapipeline.redshift.QueryStatementException: 异常Amazon无效操作:无法设置标识 列到一个值;执行START TRANSACTION时;插入 public.access_log SELECT s。* FROM staging s LEFT JOIN public.access_log t ON s。" id" = t。" id"在哪里。" id"一片空白;承诺; 在 private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatement。(RedshiftQueryStatement.java:43) 在 private.com.amazonaws.services.datapipeline.redshift.RedshiftQueryStatementFactory.newQueryStatement(RedshiftQueryStatementFactory.java:9) 在 ... private.com.amazonaws.services.datapipeline.redshift.SqlHelper.prepareStatement(SqlHelper.java:84) 在 $ TaskRunner.run(HeartbeatingTaskRunner.java:34)... 1更多引起: java.sql.SQLException:Amazon无效操作:无法设置 标识列的值;在 com.amazon.redshift.client.messages.inbound.ErrorResponse.toErrorException(未知 源)
我的CSV数据中的IP是否可能被解释为列ID?
谢谢!
答案 0 :(得分:0)
我怀疑你的s3列映射无效。你能分享你的列映射吗?