Question

我有一个名为 products 的数据框如下：

Credit | Savings | Premium
1        0         1
0        1         1
1        1         0

所有列值都是String

我想将其转换为

Credit | Savings | Premium
Credit   0         Premium
0        Savings   Premium
Credit   Savings   0

在Spark？

在Zeppelin中使用Spark 1.6.2。

Answer 1

我假设Credit , Savings , Premium是字符串列

import org.apache.spark.sql.functions._ // for `when`

df : DataFrame = ..... 

df.replace("Credit", ImmutableMap.of("1", "Credit"))
.replace("Savings ", ImmutableMap.of("1", "Savings "))
.replace("Premium", ImmutableMap.of("1", "Premium"));

另外，你也可以这样做......

df.withColumn("Credit", udf1)
.withColumn("Savings ", udf2)
.withColumn("Premium", udf3)

其中udf1,2,3是spark udfs，将“1”转换为相应的列名......

而不是udf。您也可以使用when(cond, val).otherwise(val)语法。

 df.withColumn("Credit", when (df("Credit") === "1", lit("Credit")).otherwise(0)
 .withColumn("Savings", when (df("Savings") === "1", lit("Savings ")).otherwise(0)
.withColumn("Premium", when (df("Premium") === "1", "Premium").otherwise(0)

这就是全部..祝你好运： - ）

在Spark中使用相应的列名称（有条件地）更改数据框

1 个答案: