我正在讨论如何让mrjob在EMR上工作guide。我按照所有步骤操作,但是当我运行示例脚本时,我收到此错误:
matthew@WinterMute:~/work/projects/mrjob_examples$ python word_count.py -r emr moby.txt
using configs in /etc/mrjob.conf
using existing scratch bucket mrjob-4db6342a70e021ad
using s3://mrjob-4db6342a70e021ad/tmp/ as our scratch dir on S3
creating tmp directory /tmp/word_count.matthew.20140603.181541.006786
writing master bootstrap script to /tmp/word_count.matthew.20140603.181541.006786/b.py
Copying non-input files into s3://mrjob-4db6342a70e021ad/tmp/word_count.matthew.20140603.181541.006786/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-3DCN7LULSRILW
Created new job flow j-3DCN7LULSRILW
Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
Logs are in s3://mrjob-4db6342a70e021ad/tmp/logs/j-3DCN7LULSRILW/
Scanning S3 logs for probable cause of failure
Waiting 5.0s for S3 eventual consistency
Terminating job flow: j-3DCN7LULSRILW
Traceback (most recent call last):
File "word_count.py", line 16, in <module>
MRWordFrequencyCount.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 494, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 512, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 147, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 208, in run_job
runner.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/runner.py", line 458, in run
self._run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 809, in _run
self._wait_for_job_to_complete()
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 1599, in _wait_for_job_to_complete
raise Exception(msg)
Exception: Job on job flow j-3DCN7LULSRILW failed with status FAILED: The given SSH key name was invalid
答案 0 :(得分:0)
你的工作似乎开始很好,但是mrjob无法ssh到主节点以监控它的状态。通过查看配置文件(主要是ec2_key_pair_file
和ec2_key_pair
选项)很难确定错误设置的确切原因。请务必遵循Configuring AWS credentials指南。您必须指定有效的密钥对名称(在“密钥对”部分下签入EC2管理仪表板)和相应.pem
文件的路径。
答案 1 :(得分:0)
我自己搜索错误时发现了这个问题。
我设法解决了这个问题 - SSH密钥是特定于区域的,因此您需要将mrjob.conf文件中的区域设置为SSH密钥所属的区域:
runners:
emr:
aws_access_key_id: HADOOPHADOOPBOBADOOP
aws_region: us-west-1
aws_secret_access_key: MEMIMOMADOOPBANANAFANAFOFADOOPHADOOP
见这里:https://pythonhosted.org/mrjob/guides/configs-basics.html