Question

我正在使用Spark / Scala，我希望使用基于列类型的默认值填充我的DataFrame中的空值。

即字符串列 - ＆gt; “string”，Numeric Columns - ＆gt; 111，布尔列 - ＆gt;假等。

目前DF.na.functions API提供了na.fill
fill(valueMap: Map[String, Any])喜欢

df.na.fill(Map(
    "A" -> "unknown",
    "B" -> 1.0
))

这需要知道列名称以及列的类型。

OR

fill(value: String, cols: Seq[String])

这只是String / Double类型，甚至不是布尔值。

有没有聪明的方法来做到这一点？

Answer 1

Take a look at dtypes: Array[(String, String)]. You can use the output of this method to generate a Map for fill, e.g.:

val typeMap = df.dtypes.map(column => 
    column._2 match {
        case "IntegerType" => (column._1 -> 0)
        case "StringType" => (column._1 -> "")
        case "DoubleType" => (column._1 -> 0.0)
    }).toMap

na.fill在Spark DataFrame Scala中

1 个答案: