我是Scala和Spark的新手。我正在CollectionAccumulator上编写一个示例程序。但是CollectionAccumulator的依赖关系并没有在Intellij中解决。
val slist : CollectionAccumulator[String] = new CollectionAccumulator()
sc.register(slist,"Myslist")
请找到使用的代码。我通过替换CollectionAccumulator [String]尝试了Accumulator [String]。累积器正在解决
我导入了以下内容:
import org.apache.log4j._
import org.apache.spark.{Accumulator, SparkContext}
import org.apache.spark.util._
pom.xml中的依赖项:
<dependencies>
<!-- Scala and Spark dependencies -->
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.2.0-cdh5.3.1</version>
</dependency>
请帮助..
答案 0 :(得分:2)
Spark 2.0+版本支持CollectionAccumulator。你是火花1.2.0 cdh版本。 参考:https://spark.apache.org/docs/2.0.0/api/scala/index.html#org.apache.spark.util.CollectionAccumulator
用
替换你的火花依赖<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.0.cloudera1</version>
</dependency>
同时确保&#34; $ {scala.version}&#34;解析为scala 2.11
答案 1 :(得分:1)
CollectionAccumulator
仅在spark v2.0.0后出现,只需将您的火花版本更新为2.0+
示例build.sbt
name := "smartad-spark-songplaycount"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "2.0.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
libraryDependencies += "org.apache.hadoop" % "hadoop-hdfs" % "2.2.0"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
上面.sbt
上的示例sbt控制台
sbt console
scala> import org.apache.spark.util.CollectionAccumulator
import org.apache.spark.util.CollectionAccumulator
scala> val slist : CollectionAccumulator[String] = new CollectionAccumulator()
slist: org.apache.spark.util.CollectionAccumulator[String] = Un-registered Accumulator: CollectionAccumulator