ContextCleaner:清理过的蓄能器在scala火花中是什么意思?

时间:2019-04-01 10:24:17

标签: apache-spark

当运行我的spark程序时,我看到此输出,并且要慢慢完成,这在上下文中是什么意思?

19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 82
19/04/01 15:34:24 INFO ContextCleaner: Cleaned shuffle 0
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 69
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 30
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 40
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 61
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 41
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 52
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 29
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 31
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 57
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 60
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 87
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 79
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 78
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 84
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 34
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 49
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 75
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 88
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 48

我正在使用的

name := "BigData"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.5"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
// https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc
libraryDependencies += "com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8"
libraryDependencies += "com.databricks" % "spark-xml_2.11" % "0.4.1"

// https://mvnrepository.com/artifact/com.typesafe.akka/akka-actor
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.5.19"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-http
libraryDependencies += "com.typesafe.akka" %% "akka-http" % "10.1.5"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-stream
libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.19"

// https://mvnrepository.com/artifact/org.apache.livy/livy-core
libraryDependencies += "org.apache.livy" %% "livy-core" % "0.5.0-incubating"

dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.9.4"
// https://mvnrepository.com/artifact/net.liftweb/lift-json
libraryDependencies += "net.liftweb" %% "lift-json" % "3.2.0"

// https://mvnrepository.com/artifact/org.json4s/json4s-jackson
libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.5"

// https://mvnrepository.com/artifact/org.json4s/json4s-native
libraryDependencies += "org.json4s" %% "json4s-native" % "3.6.5"

// https://mvnrepository.com/artifact/oracle/xdb

//libraryDependencies += "oracle" % "xdb" % "1.0"

2 个答案:

答案 0 :(得分:2)

You can use below properties to disable ContextCleaner

spark.cleaner.referenceTracking false
spark.cleaner.referenceTracking.blocking false
spark.cleaner.referenceTracking.blocking.shuffle false
spark.cleaner.referenceTracking.cleanCheckpoints false 

但是,如果您在2.1上运行,则无需显式设置这些属性

You can get more info from here

https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html

答案 1 :(得分:0)

ContextCleaner在驱动程序上运行。它会创建并在SparkContext启动时立即启动。上下文清理器线程,用于清理RDD,随机播放和广播状态,累加器(使用keepCleaning方法)。 context-cleaner-periodic-gc请求JVM垃圾收集器。定期运行在ContextCleaner启动时启动,在ContextCleaner停止时停止。