我编写了将Map [String,String]值转换为String:
的udf函数 udf("mapToString", (input: Map[String,String]) => input.mkString(","))
spark-shell
给我错误:
<console>:24: error: overloaded method value udf with alternatives:
(f: AnyRef,dataType: org.apache.spark.sql.types.DataType)org.apache.spark.sql.expressions.UserDefinedFunction <and>
...
cannot be applied to (String, Map[String,String] => String)
udf("mapToString", (input: Map[String,String]) => input.mkString(","))
是否有任何方法可以将Map [String,String]值的列转换为字符串值? 我需要这个转换,因为我需要将数据帧保存为csv文件
答案 0 :(得分:2)
假设您有DataFrame
+---+--------------+
|id |map |
+---+--------------+
|1 |Map(200 -> DS)|
|2 |Map(300 -> CP)|
+---+--------------+
使用以下架构
root
|-- id: integer (nullable = false)
|-- map: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
你可以写一个udf
看起来像:
def mapToString = udf((map: collection.immutable.Map[String, String]) =>
map.mkString.replace(" -> ", ","))
并将udf
函数与withColumn
API一起使用
df.withColumn("map", mapToString($"map"))
你应该有DataFrame
Map
更改为String
+---+------+
|id |map |
+---+------+
|1 |200,DS|
|2 |300,CP|
+---+------+
root
|-- id: integer (nullable = false)
|-- map: string (nullable = true)