Question

如何以编程方式从外壳脚本“ spark-submit-sh”获取应用程序ID。尝试创建驱动程序时，json响应未返回应用程序ID。如何访问已创建的应用程序ID？预先感谢

Answer 1

您可以将spark-submit命令包装在Python包装脚本中，如下所示。它拦截并从stderr中提取application_id。一旦检测到应用程序ID，它将打印到stderr，并以0的返回码退出。接下来，您可以从bash脚本中调用它并收集应用程序ID。

spark-submit.py

#!/usr/bin/python
# -*- coding: utf-8 -
from __future__ import print_function

import re
import subprocess
import sys
import os

def spark_submit():
    if len(sys.argv) < 2:
        print('Please enter the spark-submit command as the argument.')
        sys.exit(-1)

    process = subprocess.Popen(
        os.path.expandvars(os.path.expandvars([os.path.expandvars(x) for x in sys.argv[1:]]),
        stderr=subprocess.PIPE,
        universal_newlines=True,
    )

    for line in iter(lambda: process.stderr.readline(), ''):
        print(line.strip())
        match = re.search('Submitted application (.*)$', line)
        if match:
            print(match.groups()[0], file=sys.stderr)
            process.kill()
            sys.exit(0)

    sys.exit(1)

if __name__ == "__main__":
    spark_submit()

您的Bash脚本应该是这样的：

#!/usr/bin/env bash

# If you have any third party libraries to be supplied
export LIB_PATH=/tmp/app/lib

application_id="$( 
python spark-submit.py spark-submit \
  --class com.stackoverflow.spark.AppDriver \
  --master yarn \
  --deploy-mode cluster \
  --num-executors 24 \
  --executor-cores 2 \
  --executor-memory 2G \
  --jars ${LIB_PATH}/spark-csv_2.11-1.5.0.jar,${LIB_PATH}/commons-csv-1.1.jar,${LIB_PATH}/univocity-parsers-2.7.4.jar,${LIB_PATH}/scopt_2.10-3.2.0.jar \
  --conf spark.port.maxRetries=108 \
  --conf spark.app.name=YourSparkAppName \
  ${LIB_PATH}/spark-application-1.0.0.jar \
  --local-input-dir /tmp/data/input \
  --hdfs-input-dir /hdfs/data/input \
  --hdfs-archive-dir /hdfs/data/archive \
  --input-file-header false 2>&1 > /dev/null)"
  echo "application_id: $application_id"

谢谢！

使用Shell脚本以编程方式获取应用程序ID（卷曲的隐藏REST API）

1 个答案: