创造步骤火花python,亚马逊hadoop

时间:2016-08-25 07:21:58

标签: python hadoop hive pyspark amazon-emr

我正在亚马逊上用Hadoop创建一个Spark步骤,但我一直在思考。不是因为我的代码不好或发错判断,但找不到出路。

我传递代码

spark-submit --deploy-mode cluster --master yarn --num-executors 5 --executor-cores 5 --executor-memory 1g s3://URL-S3/scripts/test.py

脚本:

import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('TestSpark')

table.put_item(
   Item={
        'app_token': "1a",
        'advertising_id': "1b",
    }
)

我一直都回来了

16/08/25 07:06:22 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:23 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:24 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:25 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:26 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:27 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:28 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:29 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:30 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:31 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:32 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:33 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:34 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:35 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:36 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:37 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:38 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:39 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:40 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:41 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)
16/08/25 07:06:42 INFO Client: Application report for application_1472106590712_0002 (state: ACCEPTED)

错误日志:

2016-08-25T07:30:14.769Z INFO Step created jobs: 
2016-08-25T07:30:14.769Z WARN Step failed with exitCode 1 and took 1062 seconds

THX!

它已经是错误,但是模块并在之前安装它。

  

ImportError:没有名为boto3的模块

3 个答案:

答案 0 :(得分:1)

您的应用程序正在等待纱线资源。转到资源管理器URL并查看是否有足够的资源并使用正确的队列。如果你看看yarn resourceager日志会知道原因。

答案 1 :(得分:1)

我不在Amazon EMR上工作,但在Hadoop中,当你的YARN等待资源的时间过长时会发生这种情况。

资源协商器无法分配所需的资源,请尝试减少代码所需的资源。还要查看日志。

通读:this

同时检查YARN的状态,

sudo service hadoop-yarn-nodemanager status
sudo service hadoop-yarn-resourcemanager status

答案 2 :(得分:0)

找到错误。

未安装Boto3模块,从控制台安装,但步骤不起作用,因为他们必须在所有实例中安装它。所以我做的是创建另一个运行boostrap-action update python的claster我安装了模块boto3