通过AWS Data Pipeline和EMR进行DynamoDB备份

时间:2016-05-31 10:10:12

标签: amazon-web-services amazon-dynamodb emr amazon-data-pipeline

我们正尝试通过AWS Data Pipeline将DynamoDB表备份到S3。我们正在使用AWS提供的默认模板(http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html)。但是,作业始终失败并出现错误。更改EMR版本不会更改错误消息。

任何人都知道可能导致此错误的原因:

31 May 2016 09:57:10,013 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@523f31f2
31 May 2016 09:57:10,086 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
31 May 2016 09:57:10,093 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,094 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-2TUYGWQ1PYAHC'.
31 May 2016 09:57:10,226 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-2TUYGWQ1PYAHC' is ready.
31 May 2016 09:57:10,320 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-2TUYGWQ1PYAHC,Steps: [{Name: df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-emr-eu-west-1/emr-ddb-storage-handler/2.1.0/emr-ddb-2.1.0.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDbExport, s3://my-db-backup.dev01.rule//2016-05-30-12-58-18, my-db.dev01.rule, 0.25]}}]}'
31 May 2016 09:58:10,506 [WARN] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18' with jobFlowId 'j-2TUYGWQ1PYAHC' is in status 'WAITING' because of the step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' failures 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' with jobFlowId 'j-2TUYGWQ1PYAHC' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-09387105FF7URCW5QOR_@TableBackupActivity_2016-05-30T12:58:18_Attempt=4' is in status 'FAILED' with reason 'null'
31 May 2016 09:58:10,507 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI 2.4.8
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :Exception in thread "main" java.lang.NoClassDefFoundError: com/amazon/ws/emr/core/InstanceInfo
    at org.apache.hadoop.dynamodb.DynamoDBUtil.getDynamoDBEndpoint(DynamoDBUtil.java:268)
    at org.apache.hadoop.dynamodb.DynamoDBClient.initConfigurations(DynamoDBClient.java:369)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:88)
    at org.apache.hadoop.dynamodb.DynamoDBClient.<init>(DynamoDBClient.java:83)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.setTableProperties(DynamoDbExport.java:93)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.run(DynamoDbExport.java:75)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.dynamodb.tools.DynamoDbExport.main(DynamoDbExport.java:30)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:187)
Caused by: java.lang.ClassNotFoundException: com.amazon.ws.emr.core.InstanceInfo
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    ... 13 more
31 May 2016 09:58:10,517 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel 2.4.8
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 1.0.3
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://my-db.dev01.rule-logs/df-09387105FF7URCW5QOR/EmrClusterForBackup/@EmrClusterForBackup_2016-05-30T12:58:18/@EmrClusterForBackup_2016-05-30T12:58:18_Attempt=2/j-2TUYGWQ1PYAHC/steps
31 May 2016 09:58:10,518 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-09387105FF7URCW5QOR/TableBackupActivity/@TableBackupActivity_2016-05-30T12:58:18/@TableBackupActivity_2016-05-30T12:58:18_Attempt=4/hadoop.jobs.log
31 May 2016 09:58:10,522 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
31 May 2016 09:58:10,763 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableBackupActivity_2016-05-30T12:58:18_Attempt=4
31 May 2016 09:58:10,767 [INFO] (TaskRunnerService-df-09387105FF7URCW5QOR_@EmrClusterForBackup_2016-05-30T12:58:18-0) df-09387105FF7URCW5QOR amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 1:0 to complete

2 个答案:

答案 0 :(得分:0)

您可能正在使用EMR 4.x.我建议你试试AMI 3.8.0。如果您仍然遇到问题,请告诉我们。

答案 1 :(得分:0)

我有疑问:你是从网络控制台运行管道还是有程序?  我问的原因,请检查所有字段是否正确填写。它可能是你错过了区域,它无法找到带有空参数的方法签名,其中应该是String (ex. eu-west-1).

https://github.com/awslabs/emr-dynamodb-connector/blob/master/emr-dynamodb-tools/src/main/java/org/apache/hadoop/dynamodb/tools/DynamoDBExport.java,你可以追逐你的代码流。但请记住,这个类可能已过时,因此行可能不匹配。但它让你粗略了解那里发生了什么。