从Web UI杀死Apache Spark作业不会杀死其python子进程

时间:2018-09-20 05:30:56

标签: python python-3.x apache-spark subprocess

pyspark代码编写为使用subprocess.Popen(命令)调用另一个python作业

试图通过Spark主Web UI http://localhost:8080手动杀死Sparkcontext,并成功杀死

在python子进程被触发并作为python进程在worker节点中运行时。

使用Redhatlinux

如果我杀死pyspark sparkcontext,如何杀死python子进程?

2 个答案:

答案 0 :(得分:0)

通常,可靠地杀死一个子进程非常困难,因为当您想要杀死一个子进程时,该子进程可以执行不间断的代码。话虽如此,这听起来像“尽力而为”的方法可能适合您的情况。您将要创建并等待子流程,以便在流程中断时进行清理。最简单的方法是将您的子进程置于try / finally块中。

try:
    print("starting subprocess")
    x = subprocess.Popen(["sleep", "100000"])
    x.wait()
finally:
    print("stopping subprocess")
    x.terminate()

我相信火花会发送中断信号

答案 1 :(得分:0)

下面的代码对我有用

from subprocess import Popen, PIPE, CalledProcessError
from contextlib import contextmanager
from pyspark import SparkContext
from pyspark import SparkConf
import sys, os, subprocess, signal, time

@contextmanager
def spark_manager():
    conf = SparkConf().setAppName("TEST-SPARK")
    conf.set("spark.scheduler.mode", "FAIR")
    sc = SparkContext(conf=conf)

    try:
        yield sc
    finally:
        sc.stop()

with spark_manager() as context:
    process = subprocess.Popen(['python3', 'test.py'], shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

    # Poll process for new output until finished
    while True:
        if context._jsc.sc().isStopped():
            print(process.pid)
            time.sleep(1.0)
            os.kill(process.pid, signal.SIGKILL)
            break
        nextline = process.stdout.readline()
        if nextline == '' and process.poll() is not None:
            break
        sys.stdout.write(nextline)
        sys.stdout.flush()

    output = process.communicate()[0]
    exitCode = process.returncode

    if (exitCode == 0):
        print(output)
    else:
        raise ProcessException(command, exitCode, output)