使用map <string,string>列类型将spark DataFrame保存到csv文件

时间:2017-09-21 11:23:20

标签: scala apache-spark apache-spark-sql user-defined-functions scala-collections

我编写了将Map [String,String]值转换为String:

的udf函数
 udf("mapToString", (input: Map[String,String]) => input.mkString(","))

spark-shell给我错误:

    <console>:24: error: overloaded method value udf with alternatives:
  (f: AnyRef,dataType: org.apache.spark.sql.types.DataType)org.apache.spark.sql.expressions.UserDefinedFunction <and> 
...
cannot be applied to (String, Map[String,String] => String)
       udf("mapToString", (input: Map[String,String]) => input.mkString(","))

是否有任何方法可以将Map [String,String]值的列转换为字符串值? 我需要这个转换,因为我需要将数据帧保存为csv文件

1 个答案:

答案 0 :(得分:2)

假设您有DataFrame

+---+--------------+
|id |map           |
+---+--------------+
|1  |Map(200 -> DS)|
|2  |Map(300 -> CP)|
+---+--------------+

使用以下架构

root
 |-- id: integer (nullable = false)
 |-- map: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

你可以写一个udf看起来像:

def mapToString = udf((map: collection.immutable.Map[String, String]) => 
                       map.mkString.replace(" -> ", ","))

并将udf函数与withColumn API一起使用

df.withColumn("map", mapToString($"map"))

你应该有DataFrame Map更改为String

+---+------+
|id |map   |
+---+------+
|1  |200,DS|
|2  |300,CP|
+---+------+

root
 |-- id: integer (nullable = false)
 |-- map: string (nullable = true)