前缀除主键列之外的所有spark数据帧列

时间:2017-09-05 22:33:09

标签: scala apache-spark-sql

下面是我为列名添加前缀的代码。我想排除一个或多个主键列。我的primaryKeys是一个字符串数组,可能包含一个或多个主键字段。

val primaryKeys = args(2).split("-")

val prefix = "w1."
val renamedColumns = df.columns.map(c=> df(c).as(s"$prefix$c"))
val dfNew = df.select(renamedColumns: _*)

val prefix2 = "w2."
val renamedColumns2 = df2.columns.map(c2=> df2(c2).as(s"$prefix2$c2"))
val df2New = df2.select(renamedColumns2: _*)

If it is just one column i was able to rename using withColumnRenamed but i am unable to do it if i have multiple primary columns. 

我无法做到这样的事情

for (primaryKey <- primaryKeys) {
 dfNew.withColumnRenamed("$PREFIX1"+s"${primaryKey}",s"$primaryKey").toDF()
}

有人可以帮忙吗?

1 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,您可以有条件地将renamedColumns汇编为仅为非主键列添加前缀,如下所示:

val df = Seq(
  ("1", "a", "c1", "d1"),
  ("2", "b", "c2", "d2"),
  ("3", "c", "c3", "d3")
).toDF("pk1", "pk2", "col1", "col2")

val primaryKeys = Array("pk1", "pk2")
val prefix = "w1."

val renamedColumns = df.columns.map(
  c => if ( primaryKeys contains c ) df(c).as(c) else df(c).as(s"$prefix$c")
)

val dfNew = df.select(renamedColumns: _*)

dfNew.show
+---+---+-------+-------+
|pk1|pk2|w1.col1|w1.col2|
+---+---+-------+-------+
|  1|  a|     c1|     d1|
|  2|  b|     c2|     d2|
|  3|  c|     c3|     d3|
+---+---+-------+-------+