我是mrjob的新手并尝试从mrjob文档运行基本的字数统计脚本。我可以通过设置ssh false(ssh_tunnel_to_job_tracker:false)在emr上成功运行它。但是,如果我将选项更改为true并运行脚本,我一直有错误。
C:\Users\Administrator\MyIpython>python word_count.py -r emr 111.txt
using configs in C:\Users\Administrator\.mrjob.conf
creating new scratch bucket mrjob-5dd9c3ac4aa59e82
using s3://mrjob-5dd9c3ac4aa59e82/tmp/ as our scratch dir on S3
creating tmp directory c:\users\admini~1\appdata\local\temp\word_count.Administr
ator.20150729.080450.654000
writing master bootstrap script to c:\users\admini~1\appdata\local\temp\word_cou
nt.Administrator.20150729.080450.654000\b.py
PLEASE NOTE: Starting in mrjob v0.5.0, protocols will be strict by default. It's
recommended you run your job with --strict-protocols or set up mrjob.conf as de
scribed at https://pythonhosted.org/mrjob/whats-new.html#ready-for-strict-protoc
ols
creating S3 bucket 'mrjob-5dd9c3ac4aa59e82' to use as scratch space
Copying non-input files into s3://mrjob-5dd9c3ac4aa59e82/tmp/word_count.Administ
rator.20150729.080450.654000/files/
Waiting 5.0s for S3 eventual consistency
Creating Elastic MapReduce job flow
Job flow created with ID: j-B3GUOBAWHZ29
Created new job flow j-B3GUOBAWHZ29
Job launched 35.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 69.0s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 103.2s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 139.3s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 184.7s ago, status STARTING: Provisioning Amazon EC2 capacity
Job launched 219.1s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 253.4s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 287.3s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 323.1s ago, status BOOTSTRAPPING: Running bootstrap actions
Job launched 356.6s ago, status RUNNING: Running step (word_count.Administrator.
20150729.080450.654000: Step 1 of 1)
Opening ssh tunnel to Hadoop job tracker
Attempting to terminate job...
Traceback (most recent call last):
File "word_count.py", line 16, in <module>
MRWordFrequencyCount.run()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\job.py", line 461, in
run
mr_job.execute()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\job.py", line 479, in
execute
super(MRJob, self).execute()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\launch.py", line 153,
in execute
self.run_job()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\launch.py", line 221,
in run_job
self.stdout.flush()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\runner.py", line 633,
in __exit__
self.cleanup()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 1146, in
cleanup
super(EMRJobRunner, self).cleanup(mode=mode)
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\runner.py", line 577,
in cleanup
self._cleanup_job()
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\emr.py", line 1218, in
_cleanup_job
self._opts['ec2_key_pair_file'])
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\ssh.py", line 197, in
ssh_terminate_single_job
ssh_bin, address, ec2_key_pair_file, ['hadoop', 'job', '-list']))
File "F:\Program Files\Anaconda\lib\site-packages\mrjob\ssh.py", line 82, in s
sh_run
p = Popen(args, stdout=PIPE, stderr=PIPE, stdin=PIPE)
File "F:\Program Files\Anaconda\lib\subprocess.py", line 710, in __init__
errread, errwrite)
File "F:\Program Files\Anaconda\lib\subprocess.py", line 958, in _execute_chil
d
startupinfo)
WindowsError: [Error 2]
这就是我的配置文件:
runners:
emr:
aws_access_key_id: xxx
aws_secret_access_key: xxx
aws_region: us-east-1
ec2_key_pair: EMR
ec2_key_pair_file: C:\Users\Administrator\EMR.pem
ssh_tunnel_to_job_tracker: true
pem文件的位置应该是正确的。我不知道问题出在哪里。