pyspark代码编写为使用subprocess.Popen(命令)调用另一个python作业
试图通过Spark主Web UI http://localhost:8080手动杀死Sparkcontext,并成功杀死
在python子进程被触发并作为python进程在worker节点中运行时。
使用Redhatlinux
如果我杀死pyspark sparkcontext,如何杀死python子进程?
答案 0 :(得分:0)
通常,可靠地杀死一个子进程非常困难,因为当您想要杀死一个子进程时,该子进程可以执行不间断的代码。话虽如此,这听起来像“尽力而为”的方法可能适合您的情况。您将要创建并等待子流程,以便在流程中断时进行清理。最简单的方法是将您的子进程置于try / finally块中。
try:
print("starting subprocess")
x = subprocess.Popen(["sleep", "100000"])
x.wait()
finally:
print("stopping subprocess")
x.terminate()
我相信火花会发送中断信号
答案 1 :(得分:0)
下面的代码对我有用
from subprocess import Popen, PIPE, CalledProcessError
from contextlib import contextmanager
from pyspark import SparkContext
from pyspark import SparkConf
import sys, os, subprocess, signal, time
@contextmanager
def spark_manager():
conf = SparkConf().setAppName("TEST-SPARK")
conf.set("spark.scheduler.mode", "FAIR")
sc = SparkContext(conf=conf)
try:
yield sc
finally:
sc.stop()
with spark_manager() as context:
process = subprocess.Popen(['python3', 'test.py'], shell=False, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# Poll process for new output until finished
while True:
if context._jsc.sc().isStopped():
print(process.pid)
time.sleep(1.0)
os.kill(process.pid, signal.SIGKILL)
break
nextline = process.stdout.readline()
if nextline == '' and process.poll() is not None:
break
sys.stdout.write(nextline)
sys.stdout.flush()
output = process.communicate()[0]
exitCode = process.returncode
if (exitCode == 0):
print(output)
else:
raise ProcessException(command, exitCode, output)