这是我的架构
root
|-- DataPartition: string (nullable = true)
|-- TimeStamp: string (nullable = true)
|-- TRFCoraxData_instrumentId: long (nullable = true)
|-- TRFCoraxData_organizationId: long (nullable = true)
|-- Dividends: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- cr:AnnouncementDate: string (nullable = true)
| | |-- cr:CorporateActionAdjustedDividendGrossAmount: double (nullable = true)
| | |-- cr:CorporateActionAdjustedDividendNetAmount: double (nullable = true)
| | |-- cr:CurrencyId: long (nullable = true)
| | |-- cr:DividendEventId: long (nullable = true)
| | |-- cr:DividendGrossAmount: double (nullable = true)
| | |-- cr:DividendNetAmount: double (nullable = true)
| | |-- cr:DividendType: string (nullable = true)
| | |-- cr:ExDate: string (nullable = true)
| | |-- cr:PayDate: string (nullable = true)
| | |-- cr:PeriodDuration: string (nullable = true)
| | |-- cr:PeriodEndDate: string (nullable = true)
| | |-- cr:RecordDate: string (nullable = true)
|-- FFAction|!|: string (nullable = true)
我想爆炸并选择同一个表达式中的所有列 我不需要通过单独给出列名来写入Column或Select。
这是我爆炸的代码
val temp2 = temp1.select(getDataPartition($"DataPartition").as("DataPartition"), $"TimeStamp".as("TimeStamp"), $"TRFCoraxData_instrumentId".as("TRFCoraxData_instrumentId"), $"TRFCoraxData_organizationId".as("TRFCoraxData_organizationId"),explode($"Dividends"), $"FFAction|!|".as("FFAction|!|"))
val temp = temp2.select(temp2.columns.map(x => col(x).as(x.replace("cr:", ""))): _*)
temp.show(false)
这是我的输出,我得到的地方我将爆炸列作为Col。
如何在同一个表达式
中获取coloumn名称+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|DataPartition |TimeStamp |TRFCoraxData_instrumentId|TRFCoraxData_organizationId|col |FFAction|!||
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624 |4296241518 |[2009-07-14T00:00:00+00:00,null,0.35,500110,73014469387,0.35,null,INTE,2009-08-13T00:00:00+00:00,2009-09-15T00:00:00+00:00,P3M,2009-09-30T00:00:00+00:00,2009-08-17T00:00:00+00:00] |O|!| |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624 |4296241518 |[2008-02-05T00:00:00+00:00,null,0.3,500110,73015860528,0.3,null,INTE,2008-02-14T00:00:00+00:00,2008-03-17T00:00:00+00:00,P3M,2008-03-31T00:00:00+00:00,2008-02-19T00:00:00+00:00] |O|!| |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624 |4296241518 |[2008-04-29T00:00:00+00:00,null,0.3,500110,73015864496,0.3,null,INTE,2008-05-14T00:00:00+00:00,2008-06-16T00:00:00+00:00,P3M,2008-06-30T00:00:00+00:00,2008-05-16T00:00:00+00:00] |O|!| |
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
答案 0 :(得分:1)
如何在同一个表达式
中获取coloumn名称
col
是spark本身为爆炸列提供的列名。如果您想要除col
之外的其他名称
explode($"Dividends").as("Dividends")
然后您可以使用.*
将扩展为单独的列
temp2.select(col("Dividends.*"))
我想爆炸并选择同一个表达式中的所有列,这样我就不必通过单独给出列名来写入Column或Select
一个表达式只能使用一个生成器。