以编程方式杀死Spark Job

时间:2017-04-05 16:05:51

标签: python apache-spark pyspark jupyter-notebook

我正在通过Jupyter笔记本运行pyspark应用程序。我可以使用Spark Web UI杀死一份工作,但我想以编程方式杀死它。

我怎么能杀了它?

2 个答案:

答案 0 :(得分:0)

假设您编写了此代码:

from pyspark import SparkContext

sc = SparkContext("local", "Simple App")

# This will stop your app
sc.stop()

正如文档中的descibes: http://spark.apache.org/docs/latest/api/python/pyspark.html?highlight=stop#pyspark.SparkContext.stop

答案 1 :(得分:0)

要扩展@Netanel Malka的答案,您可以使用cancelAllJobs方法取消每个正在运行的作业,或者可以使用cancelJobGroup方法取消已组织为一组的作业。

从PySpark文档中:

cancelAllJobs()
Cancel all jobs that have been scheduled or are running.

cancelJobGroup(groupId)
Cancel active jobs for the specified group. See SparkContext.setJobGroup for more information.

以及文档中的示例:

import threading
from time import sleep
result = "Not Set"
lock = threading.Lock()

def map_func(x):
    sleep(100)
    raise Exception("Task should have been cancelled")

def start_job(x):
    global result
    try:
        sc.setJobGroup("job_to_cancel", "some description")
        result = sc.parallelize(range(x)).map(map_func).collect()
    except Exception as e:
        result = "Cancelled"
    lock.release()

def stop_job():
    sleep(5)
    sc.cancelJobGroup("job_to_cancel")

suppress = lock.acquire()
suppress = threading.Thread(target=start_job, args=(10,)).start()
suppress = threading.Thread(target=stop_job).start()
suppress = lock.acquire()
print(result)