我正在aws emr上编写python作业。
创建emr集群
script_path = s3://test/test.sh # run python script inside, need pass all argument and options to it
response = client.run_job_flow(
Name="test",
ReleaseLabel='emr-5.19.0',
Instances={
'MasterInstanceType': 'c3.8xlarge',
'InstanceCount': 1,
},
Steps=[
{
'Name': 'Run salecount forecast',
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 's3://us-east-1.elasticmapreduce/libs/script-runner/script-runner.jar',
'Args': [script_path, 'safestock -s s3']
}
}
],
BootstrapActions=[
{
'Name': 'job_on_create',
'ScriptBootstrapAction': {
'Path': bootstrap_path,
'Args': []
}
},
],
VisibleToAllUsers=True,
JobFlowRole='EMR_EC2_DefaultRole',
ServiceRole='EMR_DefaultRole'
)
执行bash脚本(进行一些环境更改,git克隆python项目,使用命令运行)
脚本内容,例如:
git clone xxx
...
...
cd xxx
python run.py $1 # here, accually would like `python run.py safestock -s s3` , or `python run.py calcsomething -s hdfs -n 12` or something else
问题是
如何通过script-runner.jar
将参数和选项传递给bash脚本中的python脚本。
'Args': [script_path, 'safestock -s s3']
这似乎不起作用
'Args': [script_path, 'safestock', '-s', 's3']
仅将safestock
读到$1