如何以编程方式从外壳脚本“ spark-submit-sh”获取应用程序ID。尝试创建驱动程序时,json响应未返回应用程序ID。如何访问已创建的应用程序ID?预先感谢
答案 0 :(得分:1)
您可以将spark-submit命令包装在Python包装脚本中,如下所示。它拦截并从stderr中提取application_id。一旦检测到应用程序ID,它将打印到stderr,并以0的返回码退出。接下来,您可以从bash脚本中调用它并收集应用程序ID。
spark-submit.py
#!/usr/bin/python
# -*- coding: utf-8 -
from __future__ import print_function
import re
import subprocess
import sys
import os
def spark_submit():
if len(sys.argv) < 2:
print('Please enter the spark-submit command as the argument.')
sys.exit(-1)
process = subprocess.Popen(
os.path.expandvars(os.path.expandvars([os.path.expandvars(x) for x in sys.argv[1:]]),
stderr=subprocess.PIPE,
universal_newlines=True,
)
for line in iter(lambda: process.stderr.readline(), ''):
print(line.strip())
match = re.search('Submitted application (.*)$', line)
if match:
print(match.groups()[0], file=sys.stderr)
process.kill()
sys.exit(0)
sys.exit(1)
if __name__ == "__main__":
spark_submit()
您的Bash脚本应该是这样的:
#!/usr/bin/env bash
# If you have any third party libraries to be supplied
export LIB_PATH=/tmp/app/lib
application_id="$(
python spark-submit.py spark-submit \
--class com.stackoverflow.spark.AppDriver \
--master yarn \
--deploy-mode cluster \
--num-executors 24 \
--executor-cores 2 \
--executor-memory 2G \
--jars ${LIB_PATH}/spark-csv_2.11-1.5.0.jar,${LIB_PATH}/commons-csv-1.1.jar,${LIB_PATH}/univocity-parsers-2.7.4.jar,${LIB_PATH}/scopt_2.10-3.2.0.jar \
--conf spark.port.maxRetries=108 \
--conf spark.app.name=YourSparkAppName \
${LIB_PATH}/spark-application-1.0.0.jar \
--local-input-dir /tmp/data/input \
--hdfs-input-dir /hdfs/data/input \
--hdfs-archive-dir /hdfs/data/archive \
--input-file-header false 2>&1 > /dev/null)"
echo "application_id: $application_id"
谢谢!