在Spark Scala中将列表列表转换为DataFrame

时间:2018-07-19 08:31:14

标签: scala apache-spark

现在我有一个这样的列表列表:

List(
  List(2,(String,String,String......),1,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),1,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),2,(String,String,String......),1,(String,String,String......)),
  List(3,(String,String,String......),2,(String,String,String......),2,(String,String,String......)),
  List(3,(String,String,String......),1,(String,String,String......),2,(String,String,String......))
)

我期望的输出格式如下:

+-----+------------------+-----+------------------+-----+------------------+
|   _1|                _2|   _3|                _4|   _5|                _6|
+-----+------------------+-----+------------------+-----+------------------+
|2    |(String,String...)|1    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|1    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|2    |(String,String...)|1    |(String,String...)|
|3    |(String,String...)|2    |(String,String...)|2    |(String,String...)|
|3    |(String,String...)|1    |(String,String...)|2    |(String,String...)|
+-----+------------------+-----+------------------+-----+------------------+

如何在Spark Scala中进行转换?我衷心希望有人能帮助我。

1 个答案:

答案 0 :(得分:2)

出于测试目的,我创建了与问题中提到的相同的测试数据

val nestedList = List(
  List(2,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"),1,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"),2,("String","String","String","String","String","String")),
  List(3,("String","String","String","String","String","String"),1,("String","String","String","String","String","String"),2,("String","String","String","String","String","String"))
)

现在,您可以将内部列表转换为元组(您可以更改元组创建中的元素数量,并根据需要键入强制类型转换)并调用toDF,您应该得到您所需的输出为

nestedList.map(x => (x(0).asInstanceOf[Int], x(1).toString, x(2).asInstanceOf[Int], x(3).toString, x(4).asInstanceOf[Int], x(5).toString)).toDF().show()

应该给您

+---+--------------------+---+--------------------+---+--------------------+
| _1|                  _2| _3|                  _4| _5|                  _6|
+---+--------------------+---+--------------------+---+--------------------+
|  2|(String,String,St...|  1|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  1|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  2|(String,String,St...|  1|(String,String,St...|
|  3|(String,String,St...|  2|(String,String,St...|  2|(String,String,St...|
|  3|(String,String,St...|  1|(String,String,St...|  2|(String,String,St...|
+---+--------------------+---+--------------------+---+--------------------+

我希望答案会有所帮助