无法爆炸并在spark scala中选择相同的表达式

时间:2018-05-30 10:22:58

标签: scala apache-spark apache-spark-sql

这是我的架构

root
 |-- DataPartition: string (nullable = true)
 |-- TimeStamp: string (nullable = true)
 |-- TRFCoraxData_instrumentId: long (nullable = true)
 |-- TRFCoraxData_organizationId: long (nullable = true)
 |-- Dividends: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- cr:AnnouncementDate: string (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendGrossAmount: double (nullable = true)
 |    |    |-- cr:CorporateActionAdjustedDividendNetAmount: double (nullable = true)
 |    |    |-- cr:CurrencyId: long (nullable = true)
 |    |    |-- cr:DividendEventId: long (nullable = true)
 |    |    |-- cr:DividendGrossAmount: double (nullable = true)
 |    |    |-- cr:DividendNetAmount: double (nullable = true)
 |    |    |-- cr:DividendType: string (nullable = true)
 |    |    |-- cr:ExDate: string (nullable = true)
 |    |    |-- cr:PayDate: string (nullable = true)
 |    |    |-- cr:PeriodDuration: string (nullable = true)
 |    |    |-- cr:PeriodEndDate: string (nullable = true)
 |    |    |-- cr:RecordDate: string (nullable = true)
 |-- FFAction|!|: string (nullable = true)

我想爆炸并选择同一个表达式中的所有列 我不需要通过单独给出列名来写入Column或Select。

这是我爆炸的代码

 val temp2 = temp1.select(getDataPartition($"DataPartition").as("DataPartition"), $"TimeStamp".as("TimeStamp"), $"TRFCoraxData_instrumentId".as("TRFCoraxData_instrumentId"), $"TRFCoraxData_organizationId".as("TRFCoraxData_organizationId"),explode($"Dividends"), $"FFAction|!|".as("FFAction|!|"))
 val temp = temp2.select(temp2.columns.map(x => col(x).as(x.replace("cr:", ""))): _*)

temp.show(false)

这是我的输出,我得到的地方我将爆炸列作为Col。

如何在同一个表达式

中获取coloumn名称
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|DataPartition    |TimeStamp                |TRFCoraxData_instrumentId|TRFCoraxData_organizationId|col                                                                                                                                                                                    |FFAction|!||
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2009-07-14T00:00:00+00:00,null,0.35,500110,73014469387,0.35,null,INTE,2009-08-13T00:00:00+00:00,2009-09-15T00:00:00+00:00,P3M,2009-09-30T00:00:00+00:00,2009-08-17T00:00:00+00:00]    |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-02-05T00:00:00+00:00,null,0.3,500110,73015860528,0.3,null,INTE,2008-02-14T00:00:00+00:00,2008-03-17T00:00:00+00:00,P3M,2008-03-31T00:00:00+00:00,2008-02-19T00:00:00+00:00]      |O|!|       |
|ThirdPartyPrivate|2017-06-07T09:18:33+00:00|8590925624               |4296241518                 |[2008-04-29T00:00:00+00:00,null,0.3,500110,73015864496,0.3,null,INTE,2008-05-14T00:00:00+00:00,2008-06-16T00:00:00+00:00,P3M,2008-06-30T00:00:00+00:00,2008-05-16T00:00:00+00:00]      |O|!|       |
+-----------------+-------------------------+-------------------------+---------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+

1 个答案:

答案 0 :(得分:1)

  

如何在同一个表达式

中获取coloumn名称

col是spark本身为爆炸列提供的列名。如果您想要除col之外的其他名称

,则可以像对其他列一样使用别名
explode($"Dividends").as("Dividends")

然后您可以使用.*扩展为单独的列

temp2.select(col("Dividends.*"))
  

我想爆炸并选择同一个表达式中的所有列,这样我就不必通过单独给出列名来写入Column或Select

一个表达式只能使用一个生成器。