如何使用Map对象列表创建Spark SQL Dataframe

时间:2019-02-17 17:55:39

标签: scala apache-spark dataframe apache-spark-sql bigdata

我在一个Map[String, String](Scala)中有多个List。例如:

map1 = Map("EMP_NAME" -> “Ahmad”, "DOB" -> “01-10-1991”, "CITY" -> “Dubai”)
map2 = Map("EMP_NAME" -> “Rahul”, "DOB" -> “06-12-1991”, "CITY" -> “Mumbai”)
map3 = Map("EMP_NAME" -> “John”, "DOB" -> “11-04-1996”, "CITY" -> “Toronto”)
list = List(map1, map2, map3)

现在,我想使用以下内容创建一个数据框:

EMP_NAME    DOB             CITY
Ahmad       01-10-1991      Dubai
Rahul       06-12-1991      Mumbai
John        11-04-1996      Toronto

我该如何实现?

3 个答案:

答案 0 :(得分:1)

您可以这样:

import spark.implicits._

val df = list
  .map( m => (m.get("EMP_NAME"),m.get("DOB"),m.get("CITY")))
  .toDF("EMP_NAME","DOB","CITY")

df.show()

+--------+----------+-------+
|EMP_NAME|       DOB|   CITY|
+--------+----------+-------+
|   Ahmad|01-10-1991|  Dubai|
|   Rahul|06-12-1991| Mumbai|
|    John|11-04-1996|Toronto|
+--------+----------+-------+

答案 1 :(得分:0)

不太具体的方法,例如:

val map1 = Map("EMP_NAME" -> "Ahmad", "DOB" -> "01-10-1991", "CITY" -> "Dubai")
val map2 = Map("EMP_NAME" -> "John",  "DOB" -> "01-10-1992", "CITY" -> "Mumbai")
///...
val list = List(map1, map2) // map3, ...
val RDDmap = sc.parallelize(list)

// Get cols dynamically
val cols = RDDmap.take(1).flatMap(x=> x.keys)

// Map is K,V like per Map entry
val df = RDDmap.map{ value=>
                     val list=value.values.toList
                     (list(0), list(1), list(2))
       }.toDF(cols:_*) // dynamic column names assigned

df.show(false)

返回:

+--------+----------+------+
|EMP_NAME|DOB       |CITY  |
+--------+----------+------+
|Ahmad   |01-10-1991|Dubai |
|John    |01-10-1992|Mumbai|
+--------+----------+------+

或回答您的子问题,如下所示-至少我认为这是您要问的,但可能不是:

val RDDmap = sc.parallelize(List(
   Map("EMP_NAME" -> "Ahmad", "DOB" -> "01-10-1991", "CITY" -> "Dubai"),
   Map("EMP_NAME" -> "John",  "DOB" -> "01-10-1992", "CITY" -> "Mumbai")))
   ...

// Get cols dynamically
val cols = RDDmap.take(1).flatMap(x=> x.keys)

// Map is K,V like per Map entry
val df = RDDmap.map{ value=>
                 val list=value.values.toList
                 (list(0), list(1), list(2))
       }.toDF(cols:_*) // dynamic column names assigned

您当然可以动态地建立一个列表,但是您仍然需要分配Map元素。参见Appending Data to List or any other collection Dynamically in scala。我只是从文件中读取并完成它。

答案 2 :(得分:0)

Phalcon\Mvc\Router\Route Object
(
    [_pattern:protected] => /schools/:params
    [_compiledPattern:protected] => #^/schools(/.*)*$#
    [_paths:protected] => Array
        (
            [module] => agencies
            [namespace] => MyNameSpace\Schools\Controllers\
            [controller] => index
            [action] => index
            [params] => 1
        )

    [_methods:protected] => 
    [_hostname:protected] => 
    [_converters:protected] => 
    [_id:protected] => 34
    [_name:protected] => 
    [_beforeMatch:protected] => 
    [_group:protected] => 
)