当运行我的spark程序时,我看到此输出,并且要慢慢完成,这在上下文中是什么意思?
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 82
19/04/01 15:34:24 INFO ContextCleaner: Cleaned shuffle 0
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 69
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 30
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 40
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 61
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 41
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 52
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 29
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 31
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 57
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 60
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 87
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 79
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 78
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 84
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 34
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 49
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 75
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 88
19/04/01 15:34:24 INFO ContextCleaner: Cleaned accumulator 48
我正在使用的
name := "BigData"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "com.github.tototoshi" %% "scala-csv" % "1.3.5"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.0"
// https://mvnrepository.com/artifact/com.microsoft.sqlserver/mssql-jdbc
libraryDependencies += "com.microsoft.sqlserver" % "mssql-jdbc" % "6.1.0.jre8"
libraryDependencies += "com.databricks" % "spark-xml_2.11" % "0.4.1"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-actor
libraryDependencies += "com.typesafe.akka" %% "akka-actor" % "2.5.19"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-http
libraryDependencies += "com.typesafe.akka" %% "akka-http" % "10.1.5"
// https://mvnrepository.com/artifact/com.typesafe.akka/akka-stream
libraryDependencies += "com.typesafe.akka" %% "akka-stream" % "2.5.19"
// https://mvnrepository.com/artifact/org.apache.livy/livy-core
libraryDependencies += "org.apache.livy" %% "livy-core" % "0.5.0-incubating"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-core" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.core" % "jackson-databind" % "2.9.4"
dependencyOverrides += "com.fasterxml.jackson.module" % "jackson-module-scala_2.11" % "2.9.4"
// https://mvnrepository.com/artifact/net.liftweb/lift-json
libraryDependencies += "net.liftweb" %% "lift-json" % "3.2.0"
// https://mvnrepository.com/artifact/org.json4s/json4s-jackson
libraryDependencies += "org.json4s" %% "json4s-jackson" % "3.6.5"
// https://mvnrepository.com/artifact/org.json4s/json4s-native
libraryDependencies += "org.json4s" %% "json4s-native" % "3.6.5"
// https://mvnrepository.com/artifact/oracle/xdb
//libraryDependencies += "oracle" % "xdb" % "1.0"
答案 0 :(得分:2)
You can use below properties to disable ContextCleaner
spark.cleaner.referenceTracking false
spark.cleaner.referenceTracking.blocking false
spark.cleaner.referenceTracking.blocking.shuffle false
spark.cleaner.referenceTracking.cleanCheckpoints false
但是,如果您在2.1上运行,则无需显式设置这些属性
You can get more info from here
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-service-contextcleaner.html
答案 1 :(得分:0)
ContextCleaner在驱动程序上运行。它会创建并在SparkContext启动时立即启动。上下文清理器线程,用于清理RDD,随机播放和广播状态,累加器(使用keepCleaning方法)。 context-cleaner-periodic-gc请求JVM垃圾收集器。定期运行在ContextCleaner启动时启动,在ContextCleaner停止时停止。