如何获取结构如下的Spark DataFrame:
val sourcedf = spark.createDataFrame(
List(
Row(Map("AL" -> "Alabama", "AK" -> "Alaska").asJava),
Row(Map("TX" -> "Texas", "FL" -> "Florida", "NJ" -> "New Jersey").asJava)
).asJava, StructType(
StructField("my_map", MapType(StringType, StringType, false)) ::
Nil))
或以文本形式,sourcedf.show(false)
显示:
+----------------------------------------------+
|my_map |
+----------------------------------------------+
|[AL -> Alabama, AK -> Alaska] |
|[TX -> Texas, FL -> Florida, NJ -> New Jersey]|
+----------------------------------------------+
并以编程方式转换为这种结构:
val targetdf = spark.createDataFrame(
List(
Row(List(Map("Key" -> "AL", "Value" -> "Alabama"), Map("Key" -> "AK", "Value" -> "Alaska")).asJava),
Row(List(Map("Key" -> "TX", "Value" -> "Texas"), Map("Key" -> "FL", "Value" -> "Florida"), Map("Key" -> "NJ", "Value" -> "New Jersey")).asJava)
).asJava, StructType(
StructField("my_list", ArrayType(MapType(StringType, StringType, false), false)) ::
Nil))
或以文本形式,targetdf.show(false)
显示:
+----------------------------------------------------------------------------------------------+
|my_list |
+----------------------------------------------------------------------------------------------+
|[[Key -> AL, Value -> Alabama], [Key -> AK, Value -> Alaska]] |
|[[Key -> TX, Value -> Texas], [Key -> FL, Value -> Florida], [Key -> NJ, Value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+```
答案 0 :(得分:0)
因此,在使用Scala时,我无法弄清楚如何使用提供的编码器处理java.util.Map
,我可能不得不自己写一个,而且我认为这工作太多。
但是,我可以看到两种方法来执行此操作,而无需转换为java.util.Map
并使用scala.collection.immutable.Map
。
您可以转换为Dataset[Obj]
和flatMap。
case class Foo(my_map: Map[String, String])
case class Bar(my_list: List[Map[String, String]])
implicit val encoder = ExpressionEncoder[List[Map[String, String]]]
val ds: Dataset[Foo] = sourcedf.as[Foo]
val output: Dataset[Bar] = ds.map(x => Bar(x.my_map.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList))
output.show(false)
或者您可以使用UDF
val mapToList: Map[String, String] => List[Map[String, String]] = {
x => x.flatMap({case (k, v) => List(Map("key" -> k, "value" -> v))}).toList
}
val mapToListUdf: UserDefinedFunction = udf(mapToList)
val output: Dataset[Row] = sourcedf.select(mapToListUdf($"my_map").as("my_list"))
output.show(false)
两个输出
+----------------------------------------------------------------------------------------------+
|my_list |
+----------------------------------------------------------------------------------------------+
|[[key -> AL, value -> Alabama], [key -> AK, value -> Alaska]] |
|[[key -> TX, value -> Texas], [key -> FL, value -> Florida], [key -> NJ, value -> New Jersey]]|
+----------------------------------------------------------------------------------------------+