我正在使用以下命令启动群集。
./elastic-mapreduce --create \
--stream \
--cache s3n://bucket_name/code/totalInstallUsers#totalInstallUsers \
--input s3n://bucket_name/input \
--output s3n://bucket_name/output \
--mapper s3n://bucket_name/code/mapper.py \
--reducer s3n://bucket_name \
--jobflow-role EMR_EC2_DefaultRole \
--service-role EMR_DefaultRole \
--debug \
--log-uri s3n://bucket_name/logs
我总是得到以下错误消息。如果删除--cache语句,群集将成功启动。
错误:未定义的方法each' for #<String:0x00000002c28ba0>
/home/ubuntu/data_processing/commands.rb:806:in
步骤'
/home/ubuntu/data_processing/commands.rb:1232:in block in enact'
/home/ubuntu/data_processing/commands.rb:1232:in
map'
/home/ubuntu/data_processing/commands.rb:1232:in enact'
/home/ubuntu/data_processing/commands.rb:49:in
块中的'enact'
/home/ubuntu/data_processing/commands.rb:49:in each'
/home/ubuntu/data_processing/commands.rb:49:in
enact'
/home/ubuntu/data_processing/commands.rb:2422:in create_and_execute_commands'
/home/ubuntu/data_processing/elastic-mapreduce-cli.rb:13:in
'
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in require'
/usr/lib/ruby/1.9.1/rubygems/custom_require.rb:36:in
require'
./elastic-mapreduce:6:in`'
使用的原因--cache是我希望从mapper.py我可以通过“with open('。/ totalInstallUsers','r')打开数据文件作为infile:
有没有人能给我一些线索?感谢答案 0 :(得分:1)
这里发布我得到的解决方案,希望对其他人有所帮助。 使用AWS EMR,命令如下所示:
aws emr create-cluster
--name "cluster--name"
--enable-debugging
--log-uri s3://bucket-name/logs
--ami-version 3.7.0
--use-default-roles
--ec2-attributes KeyName=your-key
--instance-type m3.xlarge
--instance-count 3
--auto-terminate
--steps file://./streaming.json
And in Streaming.json, it looks like:
[
{
"Type": "STREAMING",
"Name": "Streaming program",
"ActionOnFailure": "TERMINATE_CLUSTER",
"Args": [
"-files","s3://bucket-name/code/mapper.py,s3://bucket-name/code/reducer.py",
"-mapper","mapper.py",
"-reducer","reducer.py",
"-input","s3://bucket-name/input",
"-output","s3://bucket-name/output",
"-cacheFile", "s3://bucket_name/code/data-file-name#new-file-name"
]
}
]