我正在尝试使用succinctRDD进行搜索机制。 以下是我根据文档尝试的内容:
import edu.berkeley.cs.succinct.kv._
val data = sc.textFile("file:///home/aman/data/jsonDoc1.txt")
val succintdata = data.succinct.persist()
链接在这里...... succint RDD
我得到的错误在
之下<console>:32: error: value succinct is not a member of org.apache.spark.rdd.RDD[String]
val succintdata = data.succinct.persist()
如果有人可以在此指出问题或在此之前我应该遵循的任何步骤。
这基本上是sbt build。
name := "succinttest"
version := "1.0"
scalaVersion := "2.11.7"
resolvers += "Spark Packages Repo" at "http://dl.bintray.com/spark-packages/maven"
libraryDependencies += "org.apache.spark" % "spark-core_2.11" % "1.5.2"
libraryDependencies += "org.apache.kafka" % "kafka_2.11" % "0.8.2.2"
libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "1.5.2"
libraryDependencies += "org.apache.spark" % "spark-sql_2.11" % "1.5.2"
libraryDependencies += "amplab" % "succinct" % "0.1.7"
libraryDependencies += "org.apache.hadoop" % "hadoop-common" % "2.6.0" excludeAll ExclusionRule(organization = "javax.servlet")
答案 0 :(得分:1)
这是Scala中的典型隐式转换问题。
导入库时:
import edu.berkeley.cs.succinct.kv._
然后你要导入这个包中的所有类/方法,然后是所有的含义。所以,如果你在源代码上检查package.object: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/package.scala
...然后你会发现你有下一个隐式转换:
implicit class SuccinctContext(sc: SparkContext) {
def succinctKV[K: ClassTag](filePath: String, storageLevel: StorageLevel = StorageLevel.MEMORY_ONLY)
(implicit ordering: Ordering[K])
: SuccinctKVRDD[K] = SuccinctKVRDD[K](sc, filePath, storageLevel)
}
这意味着您在SparkContext上有一个新方法,可以从文本文件创建一个新的SuccinctKVRDD。所以尝试下一个代码:
import edu.berkeley.cs.succinct.kv._
val data = sc.succinctKV("file:///home/aman/data/jsonDoc1.txt")
然后你会有一个succint RDD来完成你需要的所有操作,比如search,filterByValue等: https://github.com/amplab/succinct/blob/master/spark/src/main/scala/edu/berkeley/cs/succinct/kv/SuccinctKVRDD.scala