如何在spark scala中的标题的所有列中附加cosntant

时间:2018-02-15 07:49:10

标签: scala apache-spark spark-dataframe

例如,这是我的现有标题

DataPartition|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime|^|SourceTypeCode|^|DocumentId|^|Dcn|^|DocFormat|^|StatementDate|^|IsFilingDateTimeEstimated|^|ContainsPreliminaryData|^|CapitalChangeAdjustmentDate|^|CumulativeAdjustmentFactor|^|ContainsRestatement|^|FilingDateTimeUTCOffset|^|ThirdPartySourceCode|^|ThirdPartySourcePriority|^|SourceTypeId|^|ThirdPartySourceCodeId|^|FFAction|!|

我想创建如下的标题

DataPartition_1|^|TimeStamp|^|Source.organizationId|^|Source.sourceId|^|FilingDateTime_1|^|SourceTypeCode_1|^|DocumentId_1|^|Dcn_1|^|DocFormat_1|^|StatementDate_1|^|IsFilingDateTimeEstimated_1|^|ContainsPreliminaryData_1|^|CapitalChangeAdjustmentDate_1|^|CumulativeAdjustmentFactor_1|^|ContainsRestatement_1|^|FilingDateTimeUTCOffset_1|^|ThirdPartySourceCode_1|^|ThirdPartySourcePriority_1|^|SourceTypeId_1|^|ThirdPartySourceCodeId_1|^|FFAction_1

除了TimeStamp|^|Source.organizationId|^|Source.sourceId列之外,我想在所有标题列中附加_1

我是通过使用withColumn完成的,但是使用这个我必须为所有列做。

有没有简单的方法可以使用foldLeft

1 个答案:

答案 0 :(得分:1)

首先,您需要定义要跳过的列的列表:

val columnsToAvoid = List("TimeStamp","Source.organizationId","Source.sourceId")

然后,您可以foldLeft覆盖dataFrame的列列表(由df.columns给出)重命名其未包含在columnsToAvoid列表中的每个列,否则返回未更改的dataFrame。 / p>

df.columns.foldLeft(df)((acc, elem) => 
                     if (columnsToAvoid.contains(elem)) acc 
                     else acc.withColumnRenamed(elem, elem+"_1"))

这里有一个简单的例子:

原创DF

+-----+------+-----------+
| word| value|  TimeStamp|
+-----+------+-----------+
|wordA|valueA|45435345435|
|wordB|valueB|  454244345|
|wordC|valueC|32425425435|
+-----+------+-----------+

操作:

df.columns.foldLeft(df)((acc, elem) => if (columnsToAvoid.contains(elem)) acc else acc.withColumnRenamed(elem, elem+"_1")).show

结果:

+------+-------+-----------+
|word_1|value_1|  TimeStamp|
+------+-------+-----------+
| wordA| valueA|45435345435|
| wordB| valueB|  454244345|
| wordC| valueC|32425425435|
+------+-------+-----------+