我想在下面的数据框中提取WrappedArray元素,该数据框中包含几行,到一个包含相同行数的Dataframe中。
传递给printArray函数的整个数据集df如下:
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SECURITY_ID|combined_list |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|290X2 |[[290X2, 3FA34789, 0800TS, BOXXBU, BOXXMP, 0101, 5279, 290X2, 18063, P, , 0, 0], [290X2, 3FA34782, 0800TS, BOXXBU, BOXXMP, 0102, 5322, 290X2, -863, N, , 0, 0], [290X2, 3FA34789, 0800TS, BOXXBU, BOXXMP, 0101, 5279, 290X2, -108926, N, , 0, 0]]|
|35G71 |[[35G71, 92115301, 08036C, BOXXBU, BOXXMP, 0154, 8380, 35G71, 8003, P, , 0, 0], [35G71, 92115302, 08036C, BOXXBU, BOXXMP, 0144, 8382, 35G71, -2883, N, , 0, 0]] |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
每个安全性我想要的结果如下:
Security: 290X2
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
SECURITY_ID|ACCOUNT_NO |COSTCENTER| BU| MPU|LONG_IND|SHORT_IND|QUANTITY|POS_NEG_QUANTITY|PROCESSED|ALLOC_QUANTITY|NET_QUANTITY|
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
290X2 |3FA34789| 0800TS|BOXXBU|BOXXMP| 0101| 5279| 18063| P| | 0| 0|
290X2 |3FA34782| 0800TS|BOXXBU|BOXXMP| 0102| 5322| -863| N| | 0| 0|
290X2 |3FA34789| 0800TS|BOXXBU|BOXXMP| 0101| 5279| -108926| N| | 0| 0|
+-------------+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+-------------+
Security: 35G71
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
SECURITY_ID|ACCOUNT_NO |COSTCENTER| BU| MPU|LONG_IND|SHORT_IND|QUANTITY|POS_NEG_QUANTITY|PROCESSED|ALLOC_QUANTITY|NET_QUANTITY|
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
35G71|92115301 | 08036C|BOXXBU|BOXXMP| 0154| 8380 |8003 | P| | 0| 0|
35G71|92115302 | 08036C|BOXXBU|BOXXMP| 0144| 8382 | -2883| N| | 0| 0|
+-------------+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+-------------+
我尝试在df中打开第二个元素,即t(1)并转换为数据帧,但是失败了。二手Case类也如下图所示:
case class ExplodeWrappedArray(SECURITY_ID: String, ACCOUNT_NO: String, COSTCENTER: String, BU: String, MPU: String, LONG_IND: String, SHORT_IND: String, QUANTITY: String, POS_NEG_QUANTITY: String, PROCESSED: String, ALLOC_QUANTITY: Integer, NET_QUANTITY: Integer)
def printArray(df: DataFrame): Unit = {
println("Hello")
//This fails
df.foreach(t => t(1).asInstanceOf[mutable.WrappedArray(ExplodeWrappedArray)])
//This works
//df.foreach(t => openList((t(1))))
}
将t(1)传递给openList并直接打印WrappedArray,但我想打开该数组并转换为数据框。
def openList(a: mutable.WrappedArray[ExplodeWrappedArray]): Unit = {
import sparkSession.implicits._
println("Hello")
//This works
println(a)
//This fails
val b = a.toDF("SECURITY_ID", "ACCOUNT_NO", "COSTCENTER", "BU", "MPU", "LONG_IND", "SHORT_IND", "QUANTITY", "POS_NEG_QUANTITY", "PROCESSED", "ALLOC_QUANTITY", "NET_QUANTITY")
b.printSchema()
b.show()
}