将数据从DynamoDb传输到S3

时间:2015-06-10 18:20:58

标签: amazon-web-services amazon-s3 backup amazon-dynamodb amazon-data-pipeline

我必须将DynamoDb表备份到S3中,但是当我启动此服务时,我在三次尝试后收到此错误:

  

private.com.amazonaws.AmazonServiceException:用户:   阿尔恩:AWS:STS :: 769870455028:假设角色/ DataPipelineDefaultResourceRole / I-3678d99c   无权执行:elasticmapreduce:ModifyInstanceGroups   (服务:AmazonElasticMapReduce;状态代码:400;错误代码:   AccessDeniedException异常;请求ID:   9065ea77-0f95-11e5-8f35-39a70915a1ef)   private.com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1077)   在   private.com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:725)   在   private.com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:460)   在   private.com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:295)   在   private.com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.invoke(AmazonElasticMapReduceClient.java:1391)   在   private.com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClient.modifyInstanceGroups(AmazonElasticMapReduceClient.java:785)   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)   在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)   在java.lang.reflect.Method.invoke(Method.java:606)at   private.com.amazonaws.services.datapipeline.retrier.RetryProxy.invokeInternal(RetryProxy.java:36)   在   private.com.amazonaws.services.datapipeline.retrier.RetryProxy.invoke(RetryProxy.java:48)   在com.sun.proxy。$ Proxy33.modifyInstanceGroups(未知来源)at   amazonaws.datapipeline.cluster.EmrUtil.acquireCoreNodes(EmrUtil.java:325)   在   amazonaws.datapipeline.activity.AbstractClusterActivity.resizeIfRequired(AbstractClusterActivity.java:47)   在   amazonaws.datapipeline.activity.AbstractHiveActivity.runActivity(AbstractHiveActivity.java:113)   在   amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)   在   amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:132)   在   amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:101)   在   amazonaws.datapipeline.taskrunner.TaskPoller $ 1.run(TaskPoller.java:77)   在   private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)   在   private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)   在java.lang.Thread.run(Thread.java:745)

如何备份?有人有这个错误吗? 谢谢

编辑:新政策

  

{       "版本":" 2012-10-17",       "陈述":[           {               "效果":"允许",               "行动":[                   " S3:&#34 ;,                   " dynamodb:&#34 ;,                   " EC2:描述*&#34 ;,                   " elasticmapreduce:描述*&#34 ;,                   " elasticmapreduce:ListInstance *&#34 ;,                   " elasticmapreduce:AddJobFlowSteps&#34 ;,                   " elasticmapreduce:&#34 ;,                   " RDS:描述&#34 ;,                   " datapipeline:&#34 ;,                   " CloudWatch的:&#34 ;,                   "红移:DescribeClusters&#34 ;,                   "红移:DescribeClusterSecurityGroups&#34 ;,                   " SDB:&#34 ;,                   " SNS:&#34 ;,                   " SQS:"               ]               "资源":[                   " "               ]           }       ]

这是新例外:

  

作业过程中的错误,获得的调试信息检查...任务ID:task_1434014832347_0001_m_000008(和更多)从作业job_1434014832347_0001检查任务ID:从作业task_1434014832347_0001_m_000013(和更多)job_1434014832347_0001检查任务ID:task_1434014832347_0001_m_000005(和更多)从作业job_1434014832347_0001检查任务ID:task_1434014832347_0001_m_000034(及以上),从工作job_1434014832347_0001检查任务ID:从工作task_1434014832347_0001_m_000044(及以上)job_1434014832347_0001检查任务ID:从工作task_1434014832347_0001_m_000004(及以上)job_1434014832347_0001任务最失败(4):-----任务ID:task_1434014832347_0001_m_000002 URL:http://ip-10-37-138-149.eu-west-1.compute.internal:9026/taskdetails.jsp?jobid=job_1434014832347_0001&tipid=task_1434014832347_0001_m_000002 -----此任务的诊断消息:错误:Java堆空间FAILED:执行错误,从org.apache.hadoop.hive.ql.exec.mr.MapRedTask返回代码2 MapReduce职位

1 个答案:

答案 0 :(得分:1)

在EMR群集上运行的Datapipeline代理(TaskRunner)正在尝试调整EMR群集的大小并且它正在失败。您传递给EMR集群的资源角色无权调用以下api AmazonElasticMapReduce :: modifyInstanceGroups。

我刚看了一下DefaultResourceRolePolicy,它是在控制台中使用向导创建的,(http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html) 这些是emr允许的策略:         " elasticmapreduce:描述*&#34 ;,         " elasticmapreduce:ListInstance *&#34 ;,         " elasticmapreduce:AddJobFlowSteps"

我发现它不允许使用ModifyInstanceGroups 请更新您的资源角色策略以允许该策略。 E.g," elasticmapreduce:*"

报告此错误的Thx。同时,我们将努力修复控制台向导生成的默认资源角色策略。

Aravind R。