输入数据集:
CustomerID CustomerName Sun Mon Tue
1 ABC 0 12 10
2 DEF 10 0 0
必需的输出数据集:
CustomerID CustomerName Day Value
1 ABC Sun 0
1 ABC Mon 12
1 ABC Tue 10
2 DEF Sun 10
2 DEF Mon 0
2 DEF Tue 0
请注意,我的数据集中“Sun Mon Tue”列的数量是82!
答案 0 :(得分:2)
假设您的输入dataset
是使用case class
生成的
case class infos(CustomerID: Int, CustomerName: String, Sun: Int, Mon: Int, Tue: Int)
出于测试目的,我正在创建一个dataset
import sqlContext.implicits._
val ds = Seq(
infos(1, "ABC", 0, 12, 10),
infos(2, "DEF", 10, 0, 0)
).toDS
应该提供您的输入dataset
+----------+------------+---+---+---+
|CustomerID|CustomerName|Sun|Mon|Tue|
+----------+------------+---+---+---+
|1 |ABC |0 |12 |10 |
|2 |DEF |10 |0 |0 |
+----------+------------+---+---+---+
获取最终要求dataset
要求您创建另一个case class
case class finalInfos(CustomerID: Int, CustomerName: String, Day: String, Value: Int)
通过执行以下操作可以实现最终所需的dataset
val names = ds.schema.fieldNames
ds.flatMap(row => Array(finalInfos(row.CustomerID, row.CustomerName, names(2), row.Sun),
finalInfos(row.CustomerID, row.CustomerName, names(3), row.Mon),
finalInfos(row.CustomerID, row.CustomerName, names(4), row.Tue)))
应该为dataset
提供
+----------+------------+---+-----+
|CustomerID|CustomerName|Day|Value|
+----------+------------+---+-----+
|1 |ABC |Sun|0 |
|1 |ABC |Mon|12 |
|1 |ABC |Tue|10 |
|2 |DEF |Sun|10 |
|2 |DEF |Mon|0 |
|2 |DEF |Tue|0 |
+----------+------------+---+-----+