在Amazon EMR上运行mrjob,不支持t2.micro

时间:2015-07-31 01:39:21

标签: python hadoop amazon-web-services emr mrjob

我尝试在Amazon EMR上运行mrjob脚本。当我使用实例c1.medium时它运行良好,但是当我将instnace更改为t2.micro时它出错了。完整的错误消息如下所示。

  

C:\ Users \ Administrator \ MyIpython> python word_count.py -r emr 111.txt   在C:\ Users \ Administrator.mrjob.conf中使用配置创建新的   刮铲mrjob-875a948553aab9e8使用   s3:// mrjob-875a948553aab9e8 / tmp /作为S3创建tmp的临时目录   目录c:\ users \ admini~1 \ appdata \ local \ temp \ word_count.Administr   ator.20150731.013007.592000编写主引导脚本   C:\用户\ ADMINI〜1 \应用程序数据\本地\ TEMP \ word_cou   nt.Administrator.20150731.013007.592000 \ b.py

     

请注意:从mrjob v0.5.0开始,协议将严格遵守   默认。建议你使用--strict-protocols或者运行你的工作   按照描述设置mrjob.conf   https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protoc   醇

     

创建S3存储桶'mrjob-875a948553aab9e8'以用作临时空间   将非输入文件复制到   S3://mrjob-875a948553aab9e8/tmp/word_count.Administ   rator.20150731.013007.592000 / files /等待5.0s for S3 finalual   一致性创建Elastic MapReduce作业流回溯(最近的   最后调用):文件“word_count.py”,第16行,in       MRWordFrequencyCount.run()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ job.py”,第461行,运行中       mr_job.execute()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ job.py”,第479行,执行中       super(MRJob,self).execute()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ launch.py​​”,第153行,in   执行       self.run_job()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ _ launch.py​​”,第216行,in   run_job       runner.run()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ runner.py”,第470行,在运行中       self._run()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ emr.py”,第881行,in   _跑       self._launch()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ emr.py”,第886行,in   _发射       self._launch_emr_job()文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ emr.py”,第1593行,in   _launch_emr_job       persistent = False)文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ emr.py”,第1327行,in   _create_job_flow       self._job_name,self._opts ['s3_log_uri'],** args)文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ retry.py”,line   149,我是call_and_maybe_retry       return f(* args,** kwargs)文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ mrjob \ retry.py”,第71行,in   call_and_maybe_retry       result = getattr(alternative,name)(* args,** kwargs)文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ boto \ emr \ connection.py”,   lin e 581,在run_jobflow中       'RunJobFlow',params,RunJobFlowResponse,verb ='POST')文件“F:\ Program Files \ Anaconda \ lib \ site-packages \ boto \ connection.py”,line   12 08,在get_object中       raise self.ResponseError(response.status,response.reason,body)boto.exception.EmrResponseError:EmrResponseError:400 Bad Request   
         寄件人       ValidationError       不支持实例类型't2.micro'c3ee1107-3723-11e5-8d8e-f1011298229d   

这是我的配置文件详细信息

runners:
  emr:
    aws_access_key_id: xxxxxxxxxxx
    aws_secret_access_key: xxxxxxxxxxxxx
    aws_region: us-east-1
    ec2_key_pair: EMR
    ec2_key_pair_file: C:\Users\Administrator\EMR.pem
    ssh_tunnel_to_job_tracker: false
    ec2_instance_type: t2.micro
    num_ec2_instances: 2

1 个答案:

答案 0 :(得分:0)

EMR不支持t2实例类型。如果你担心钱,现货实例是一个非常划算的选择:现在m1.xlarge每小时不到0.05美元,m1.medium每小时0.01美元(比{{1便宜)无论如何)支持的类型如下(来自EMR webapp控制台的屏幕截图:

enter image description here

enter image description here