错误:值succinct不是org.apache.spark.rdd.RDD [String]

时间:2016-07-08 10:43:40

标签: search apache-spark spark-streaming

我正在尝试使用succinctRDD进行搜索机制。 以下是我根据文档尝试的内容:

import edu.berkeley.cs.succinct.kv._
val data = sc.textFile("file:///home/aman/data/jsonDoc1.txt")
val succintdata = data.succinct.persist()

链接在这里...... succint RDD

我得到的错误在

之下
<console>:32: error: value succinct is not a member of org.apache.spark.rdd.RDD[String]
         val succintdata = data.succinct.persist()  

如果有人可以在此指出问题或在此之前我应该​​遵循的任何步骤。

这基本上是sbt build。

name := "succinttest"

version := "1.0"

scalaVersion := "2.11.7"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.5.2"
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "1.5.2"
libraryDependencies += "amplab" % "succinct" % "0.1.7"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll ExclusionRule(organization = "javax.servlet")

1 个答案:

答案 0 :(得分:1)

这是Scala中的典型隐式转换问题。

导入库时:

import edu.berkeley.cs.succinct.kv._

然后你要导入这个包中的所有类/方法,然后是所有的含义。所以,如果你在源代码上检查package.object: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/package.scala

...然后你会发现你有下一个隐式转换:

implicit class SuccinctContext(sc: SparkContext) {
  def succinctKV[K: ClassTag](filePath: String, storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY) 
  (implicit ordering: Ordering[K])
  : SuccinctKVRDD[K] = SuccinctKVRDD[K](sc, filePath, storageLevel)
}

这意味着您在SparkContext上有一个新方法,可以从文本文件创建一个新的SuccinctKVRDD。所以尝试下一个代码:

import edu.berkeley.cs.succinct.kv._
val data = sc.succinctKV("file:///home/aman/data/jsonDoc1.txt")

然后你会有一个succint RDD来完成你需要的所有操作,比如search,filterByValue等: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/SuccinctKVRDD.scala