我想获取DataFrame的所有列。如果DataFrame具有平面结构(没有嵌套的StructTypes)df.columns
会产生正确的结果。我也希望返回所有嵌套列名,例如:克。
给出
val schema = StructType(
StructField("name", StringType) ::
StructField("nameSecond", StringType) ::
StructField("nameDouble", StringType) ::
StructField("someStruct", StructType(
StructField("insideS", StringType)::
StructField("insideD", DoubleType)::
Nil
)) ::
Nil
)
val rdd = spark.sparkContext.emptyRDD[Row]
val df = spark.createDataFrame(rdd, schema)
我想要
Seq("name", "nameSecond", "nameDouble", "someStruct", "insideS", "insideD")
答案 0 :(得分:4)
您可以使用此递归函数遍历架构:
def flattenSchema(schema: StructType): Seq[String] = {
schema.fields.flatMap {
case StructField(name, inner: StructType, _, _) => Seq(name) ++ flattenSchema(inner)
case StructField(name, _, _, _) => Seq(name)
}
}
println(flattenSchema(schema))
// prints: ArraySeq(name, nameSecond, nameDouble, someStruct, insideS, insideD)