将具有行的WrappedArray列转换为具有行的数据框

时间:2019-09-04 19:30:13

标签: scala apache-spark hadoop apache-spark-sql

我想在下面的数据框中提取WrappedArray元素,该数据框中包含几行,到一个包含相同行数的Dataframe中。

传递给printArray函数的整个数据集df如下:

+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|SECURITY_ID|combined_list                                                                                                                                                                                                                                    |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|290X2      |[[290X2, 3FA34789, 0800TS, BOXXBU, BOXXMP, 0101, 5279, 290X2, 18063, P, , 0, 0], [290X2, 3FA34782, 0800TS, BOXXBU, BOXXMP, 0102, 5322, 290X2, -863, N, , 0, 0], [290X2, 3FA34789, 0800TS, BOXXBU, BOXXMP, 0101, 5279, 290X2, -108926, N, , 0, 0]]|
|35G71      |[[35G71, 92115301, 08036C, BOXXBU, BOXXMP, 0154, 8380, 35G71, 8003, P, , 0, 0], [35G71, 92115302, 08036C, BOXXBU, BOXXMP, 0144, 8382, 35G71, -2883, N, , 0, 0]]                                                                                  |
+-----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

每个安全性我想要的结果如下:

    Security: 290X2
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
SECURITY_ID|ACCOUNT_NO |COSTCENTER|    BU|   MPU|LONG_IND|SHORT_IND|QUANTITY|POS_NEG_QUANTITY|PROCESSED|ALLOC_QUANTITY|NET_QUANTITY|
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
290X2      |3FA34789|    0800TS|BOXXBU|BOXXMP|    0101|     5279|      18063|               P|         |             0|           0|
290X2      |3FA34782|    0800TS|BOXXBU|BOXXMP|    0102|     5322|       -863|               N|         |             0|           0|
290X2      |3FA34789|    0800TS|BOXXBU|BOXXMP|    0101|     5279|    -108926|               N|         |             0|           0|
+-------------+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+-------------+

    Security: 35G71
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
SECURITY_ID|ACCOUNT_NO |COSTCENTER|    BU|   MPU|LONG_IND|SHORT_IND|QUANTITY|POS_NEG_QUANTITY|PROCESSED|ALLOC_QUANTITY|NET_QUANTITY|
+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+--------------+------------+
35G71|92115301   |    08036C|BOXXBU|BOXXMP|    0154|     8380      |8003    |               P|         |             0|           0|
35G71|92115302   |    08036C|BOXXBU|BOXXMP|    0144|     8382      |   -2883|               N|         |             0|           0|
+-------------+----------+----------+------+------+--------+---------+-----------+--------+----------------+---------+-------------+

我尝试在df中打开第二个元素,即t(1)并转换为数据帧,但是失败了。二手Case类也如下图所示:

case class ExplodeWrappedArray(SECURITY_ID: String, ACCOUNT_NO: String, COSTCENTER: String, BU: String, MPU: String, LONG_IND: String, SHORT_IND: String, QUANTITY: String, POS_NEG_QUANTITY: String, PROCESSED: String, ALLOC_QUANTITY: Integer, NET_QUANTITY: Integer)

def printArray(df: DataFrame): Unit = {
    println("Hello")

//This fails
    df.foreach(t => t(1).asInstanceOf[mutable.WrappedArray(ExplodeWrappedArray)])

//This works
    //df.foreach(t => openList((t(1))))
  }

将t(1)传递给openList并直接打印WrappedArray,但我想打开该数组并转换为数据框。

def openList(a: mutable.WrappedArray[ExplodeWrappedArray]): Unit = {
    import sparkSession.implicits._
    println("Hello")

    //This works 
    println(a)

    //This fails
    val b = a.toDF("SECURITY_ID", "ACCOUNT_NO", "COSTCENTER", "BU", "MPU", "LONG_IND", "SHORT_IND", "QUANTITY", "POS_NEG_QUANTITY", "PROCESSED", "ALLOC_QUANTITY", "NET_QUANTITY")
    b.printSchema()
    b.show()
  }

0 个答案:

没有答案