我在尝试解析我的火花工作中的json时遇到了问题。我使用spark 1.1.0,json4s和Cassandra Spark Connector,使用DSE 4.6。抛出的异常是:
org.json4s.package$MappingException: Can't find constructor for BrowserData org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
我的代码如下所示:
case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
flash_version: Option[FlashVersion],
viewport: Option[Viewport],
performanceData: Option[PerformanceData])
.... other case classes
def parseJson(b: Option[String]): Option[String] = {
implicit val formats = DefaultFormats
for {
browserDataStr <- b
browserData = parse(browserDataStr).extract[BrowserData]
navObject <- browserData.navigatorObjectData
userAgent <- navObject.userAgent
} yield (userAgent)
}
def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
implicit val formats = DefaultFormats
rows.collectFirst { case r if r.getStringOption("browser_data").isDefined =>
parseJson(r.getStringOption("browser_data"))
}.flatten
}
def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
rows.collectFirst { case r if r.getStringOption("ua").isDefined =>
r.getStringOption("ua")
}.flatten
}
def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
for {
jsUa <- getJavascriptUa(rows)
reqUa <- getRequestUa(rows)
} yield (jsUa == reqUa)
}
def run(name: String) = {
val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
val counts = rdd.map(r => (checkUa(r._2, r._1)))
counts
}
我使用:load
将文件加载到REPL中,然后调用run
函数。据我所知,失败发生在parseJson
函数中。我试过各种各样的事情试图让它发挥作用。从类似的posts开始,我确保我的案例类位于文件的顶层。我已经尝试将案例类定义编译到jar中,并像这样包含jar:/usr/bin/dse spark --jars case_classes.jar
我已经尝试将它们添加到这样的内容中:sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))
仍然是同样的错误。我应该将所有代码编译成jar吗?这是火花问题还是JSON4s问题?任何帮助都赞赏。