例如,原始数据帧如下:
+--------+--------+
| col1| col2|
+--------+--------+
| null| A|
| B| null|
| C| D|
| null| null|
+--------+--------+
我想连结col1
和col2
以获取以下数据框:
+--------+--------+-------------------+
| col1| col2| col3|
+--------+--------+-------------------+
| null| A| "{col2:A}"|
| B| null| "{col1:B}"|
| C| D| "{col1:C, col2:D}"|
| null| null| "{}"|
+--------+--------+-------------------+
新的col3
由非null col1
和非null col2
连接在一起。 col3是字符串类型。如何将空条件添加到concat方法?
答案 0 :(得分:2)
您可以将列组合成数组
import org.apache.spark.sql.functions._
val df = Seq((null, "A"), ("B", null), ("C", "D"), (null, null)).toDF("colA", "colB")
val cols = array(df.columns.map(c =>
// If column is not null, merge it with its name otherwise null
when(col(c).isNotNull, concat_ws(":", lit(c), col(c)))): _*
)
并使用UserDefinedFunction
val combine = udf((xs: Seq[String]) => {
val tmp = xs.filter { _ != null }.mkString(",")
s"{$tmp}"
})
df.withColumn("col3", combine(cols)).show
// +----+----+---------------+
// |colA|colB| col3|
// +----+----+---------------+
// |null| A| {colB:A}|
// | B|null| {colA:B}|
// | C| D|{colA:C,colB:D}|
// |null|null| {}|
// +----+----+---------------+