JSON4s找不到构造函数w / spark

时间:2015-04-16 06:02:47

标签: java scala apache-spark json4s

我在尝试解析我的火花工作中的json时遇到了问题。我使用spark 1.1.0,json4s和Cassandra Spark Connector,使用DSE 4.6。抛出的异常是:

org.json4s.package$MappingException: Can't find constructor for BrowserData      org.json4s.reflect.ScalaSigReader$.readConstructor(ScalaSigReader.scala:27)
   org.json4s.reflect.Reflector$ClassDescriptorBuilder.ctorParamType(Reflector.scala:108)
        org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:98)
        org.json4s.reflect.Reflector$ClassDescriptorBuilder$$anonfun$6.apply(Reflector.scala:95)
        scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)

我的代码如下所示:

case class BrowserData(navigatorObjectData: Option[NavigatorObjectData],
                       flash_version: Option[FlashVersion],
                       viewport: Option[Viewport],
                       performanceData: Option[PerformanceData])

.... other case classes

def parseJson(b: Option[String]): Option[String] = {
    implicit val formats = DefaultFormats
      for {
        browserDataStr <- b
        browserData = parse(browserDataStr).extract[BrowserData]
        navObject <- browserData.navigatorObjectData
        userAgent <- navObject.userAgent
      } yield (userAgent)
  }

def getJavascriptUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
  implicit val formats = DefaultFormats
  rows.collectFirst { case r if r.getStringOption("browser_data").isDefined  =>
    parseJson(r.getStringOption("browser_data"))
  }.flatten
}

def getRequestUa(rows: Iterable[com.datastax.spark.connector.CassandraRow]): Option[String] = {
  rows.collectFirst { case r if r.getStringOption("ua").isDefined  =>
    r.getStringOption("ua")
  }.flatten
}

def checkUa(rows: Iterable[com.datastax.spark.connector.CassandraRow], sessionId: String): Option[Boolean] = {
  for {
    jsUa <- getJavascriptUa(rows)
    reqUa <- getRequestUa(rows)
  } yield (jsUa == reqUa)
}

def run(name: String) = {
  val rdd = sc.cassandraTable("beehive", name).groupBy(r => r.getString("session_id"))
  val counts = rdd.map(r => (checkUa(r._2, r._1)))
  counts
}

我使用:load将文件加载到REPL中,然后调用run函数。据我所知,失败发生在parseJson函数中。我试过各种各样的事情试图让它发挥作用。从类似的posts开始,我确保我的案例类位于文件的顶层。我已经尝试将案例类定义编译到jar中,并像这样包含jar:/usr/bin/dse spark --jars case_classes.jar

我已经尝试将它们添加到这样的内容中:sc.getConf.setJars(Seq("/home/ubuntu/case_classes.jar"))

仍然是同样的错误。我应该将所有代码编译成jar吗?这是火花问题还是JSON4s问题?任何帮助都赞赏。

0 个答案:

没有答案