我有一个带有以下架构和样本记录的数据框
root
|-- name: string (nullable = true)
|-- matches: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = false)
+---------------+------------------------------------------------------------------------------------------+
|name |matches |
+---------------+------------------------------------------------------------------------------------------+
|CVS_Extra |Map(MLauer -> 1, MichaelBColeman -> 1, OhioFoodbanks -> 1, 700wlw -> 1, cityofdayton -> 1)|
我正在尝试使用下面的代码(json4s库)将地图类型列转换为json:
val d = countDF.map( row => (row(0),convertMapToJSON(row(1).asInstanceOf[Map[String, Int]]).toString()))
但是
失败了java.lang.ClassNotFoundException: scala.Any
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at scala.reflect.runtime.JavaMirrors$JavaMirror.javaClass(JavaMirrors.scala:555)
at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1210)
at scala.reflect.runtime.JavaMirrors$JavaMirror$$anonfun$classToJava$1.apply(JavaMirrors.scala:1202)
at scala.reflect.runtime.TwoWayCaches$TwoWayCache$$anonfun$toJava$1.apply(TwoWayCaches.scala:50)
at scala.reflect.runtime.Gil$class.gilSynchronized(Gil.scala:19)
at scala.reflect.runtime.JavaUniverse.gilSynchronized(JavaUniverse.scala:16)
at scala.reflect.runtime.TwoWayCaches$TwoWayCache.toJava(TwoWayCaches.scala:45)
at scala.reflect.runtime.JavaMirrors$JavaMirror.classToJava(JavaMirrors.scala:1202)
at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:194)
at scala.reflect.runtime.JavaMirrors$JavaMirror.runtimeClass(JavaMirrors.scala:54)
at org.apache.spark.sql.catalyst.ScalaReflection$.getClassFromType(ScalaReflection.scala:682)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$dataTypeFor(ScalaReflection.scala:84)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:614)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.LowPrioritySQLImplicits$class.newProductEncoder(SQLImplicits.scala:233)
at org.apache.spark.sql.SQLImplicits.newProductEncoder(SQLImplicits.scala:33)
Scala Version - 2.11, json4s-jackson_2.11 & spark 2.2.0
任何人都可以建议如何克服此错误。提前谢谢。
答案 0 :(得分:0)
您的代码失败,因为您错误地使用了apply
方法。你应该使用例如:
countDF.map(row =>
(row.getString(0), convertMapToJSON(getMap[String, Int](1)).toString())
)
有关详情,请参阅Spark extracting values from a Row。
但您需要的只是select
/ withColumn
to_json
:
import org.apache.spark.sql.functions.to_json
countDF.withColumn("matches", to_json($"matches"))
如果您的函数使用更复杂的逻辑,请使用udf
import org.apache.spark.sql.functions.udf
val convert_map_to_json = udf(
(map: Map[String, Int]) => convertMapToJSON(map).toString
)
countDF.withColumn("matches", convert_map_to_json($"matches"))