我在aws emr上使用了script-runner,并且考虑到它可能看起来非常基本(也许是stuid)问题,但是我读了很多文档而没有人回答为什么我们需要emr中的脚本运行器,当它所做的只是在主节点中执行脚本。 是否可以使用bash运行相同的脚本?
答案 0 :(得分:5)
当您只想执行脚本但入口点需要jar时,需要使用脚本运行器。例如,提交EMR步骤将执行" hadoop jar blah ..."命令。但如果"等等#34;是一个失败的脚本。脚本运行器成为Step期望的jar,然后使用其参数(脚本的路径)来执行shell脚本。
答案 1 :(得分:2)
When you are running your script in bash, you need to have the script locally and also you need to set all the configurations to work as you expect it.
With the script-runner you have more options, for example, run it as part of your cluster launch command, as well execute a script that is hosted remotely in S3. See the example from the EMR documentations: http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-script.html