Scala（程序化方式）：

Question

Spark架构完全围绕执行程序和核心的概念。我想看看在集群中运行的spark应用程序运行了多少执行程序和核心。

我试图在我的应用程序中使用下面的代码段，但没有运气。

val conf = new SparkConf().setAppName("ExecutorTestJob")
val sc = new SparkContext(conf)
conf.get("spark.executor.instances")
conf.get("spark.executor.cores")

有没有办法使用SparkContext对象或SparkConf对象等来获取这些值。

Answer 1

Scala（程序化方式）：

getExecutorStorageStatus和getExecutorMemoryStatus都返回包括驱动程序在内的执行程序数。如下面的示例代码段。

/** Method that just returns the current active/registered executors
        * excluding the driver.
        * @param sc The spark context to retrieve registered executors.
        * @return a list of executors each in the form of host:port.
        */
       def currentActiveExecutors(sc: SparkContext): Seq[String] = {
         val allExecutors = sc.getExecutorMemoryStatus.map(_._1)
         val driverHost: String = sc.getConf.get("spark.driver.host")
         allExecutors.filter(! _.split(":")(0).equals(driverHost)).toList
       }

sc.getConf.getInt("spark.executor.instances", 1)

类似地获取所有属性并打印如下，您也可以获得核心信息..

sc.getConf.getAll.mkString("\n")

OR

sc.getConf.toDebugString

对于执行者spark.executor.cores驱动程序来说，spark.driver.cores通常应该具有此值。

Python：

Above methods getExecutorStorageStatus and getExecutorMemoryStatus, In python api were not implemented

EDIT 但是可以使用SparkSession公开的Py4J绑定来访问。

sc._jsc.sc().getExecutorMemoryStatus()

Answer 2

这是一个古老的问题，但这是我在Spark 2.3.0上解决这个问题的代码：

+ 414     executor_count = len(spark.sparkContext._jsc.sc().statusTracker().getExecutorInfos()) - 1
+ 415     cores_per_executor = int(spark.sparkContext.getConf().get('spark.executor.cores','1'))

Answer 3

这是获取内核数量的python示例（包括master＆＃39; s） def workername(): import socket return str(socket.gethostname()) anrdd=sc.parallelize(['','']) namesRDD = anrdd.flatMap(lambda e: (1,workername())) namesRDD.count()

Spark - 为我的spark作业分配了多少个执行器和内核

3 个答案:

Scala（程序化方式）：

Python：