如何将数据帧值转换为Map [String,List [String]]?

时间:2019-07-26 10:55:53

标签: dataframe apache-spark

我想将下面的数据框转换为Map [String,List [String]]。我已经更改了初始数据框,以获取列表格式的“名称”列,但无法将其转换为Map[String,List[String]]

DataFrame

+---------+-------+
|City     |  Name |
+---------+-------+
|Mumbai   |[A,B]  |
|Pune     |[C,D]  |
|Delhi    |[A,D]  |
+---------+-------+

预期输出:

Map(Mumbai -> List(A,B), Pune -> List(C,D), Delhi-> List(A,D))

1 个答案:

答案 0 :(得分:0)

您可以转换为rdd并收集为地图,如下所示

def union_scope(*relation)
  listable = relation.first[0]
  scope = relation.first[1]
  combined = scope.select("#{scope.table_name}.*, \'#{listable.class.name}\' as listable")
  relation.drop(1).each do |relation_set|
    listable = relation_set[0]
    scope = relation_set[1].select("#{scope.table_name}.*, \'#{listable.class.name}\' as listable")
    combined = combined.or(scope)
  end
  combined
end

希望这会有所帮助!