如何将地图的RDD转换为数据帧

时间:2016-11-24 08:00:15

标签: scala apache-spark apache-spark-sql

我有地图的RDD,我想将其转换为数据帧 这是RDD的输入格式

val mapRDD: RDD[Map[String, String]] = sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

有没有办法转换成数据框,如

 val df=mapRDD.toDf

df.show

empid,  empName,    depId
12      Rohan       201
13      Ross        201
14      Richard     401
15      Michale     501
16      John        701

1 个答案:

答案 0 :(得分:13)

您可以轻松将其转换为Spark DataFrame:

这是一个可以解决问题的代码:

val mapRDD= sc.parallelize(Seq(
   Map("empid" -> "12", "empName" -> "Rohan", "depId" -> "201"),
   Map("empid" -> "13", "empName" -> "Ross", "depId" -> "201"),
   Map("empid" -> "14", "empName" -> "Richard", "depId" -> "401"),
   Map("empid" -> "15", "empName" -> "Michale", "depId" -> "501"),
   Map("empid" -> "16", "empName" -> "John", "depId" -> "701")))

val columns=mapRDD.take(1).flatMap(a=>a.keys)

val resultantDF=mapRDD.map{value=>
      val list=value.values.toList
      (list(0),list(1),list(2))
      }.toDF(columns:_*)

resultantDF.show()

输出结果为:

+-----+-------+-----+
|empid|empName|depId|
+-----+-------+-----+
|   12|  Rohan|  201|
|   13|   Ross|  201|
|   14|Richard|  401|
|   15|Michale|  501|
|   16|   John|  701|
+-----+-------+-----+