我正在尝试使用aws sdk ruby为使用hadoop的Amazon ElasticMapReduce服务运行一个步骤,而我可以创建群集和步骤,步骤总是失败但不是在使用Web界面手动设置时
emr = Aws::EMR::Client.new
cluster_id = "*******"
resp = emr.add_job_flow_steps({
job_flow_id: cluster_id, # required
steps: [ # required
{
name: "TestStep", # required
action_on_failure: "CANCEL_AND_WAIT", # accepts TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, CONTINUE
hadoop_jar_step: { # required
jar: 'command-runner.jar',
args:[
"-files",
"s3://source123/mapper.py,s3://source123/source_reducer.py",
"-mapper",
"mapper.py",
"-reducer",
"source_reducer.py",
"-input",
"s3://source123/input/",
"-output",
"s3://source123/output/"
]
},
},
],
})
我得到的错误就是这个
Cannot run program "-files" (in directory "."): error=2, No such file or directory
任何线索?
答案 0 :(得分:0)
似乎添加hadoop-streaming的工作原理如下
emr = Aws::EMR::Client.new
cluster_id = "*******"
resp = emr.add_job_flow_steps({
job_flow_id: cluster_id, # required
steps: [ # required
{
name: "TestStep", # required
action_on_failure: "CANCEL_AND_WAIT", # accepts TERMINATE_JOB_FLOW, TERMINATE_CLUSTER, CANCEL_AND_WAIT, CONTINUE
hadoop_jar_step: { # required
jar: 'command-runner.jar',
args:[
"hadoop-streaming",
"-files",
"s3://source123/mapper.py,s3://source123/source_reducer.py",
"-mapper",
"mapper.py",
"-reducer",
"source_reducer.py",
"-input",
"s3://source123/input/",
"-output",
"s3://source123/output/"
]
},
},
],
})