如何在Scala中删除嵌套列或过滤嵌套列

时间:2020-06-29 23:37:04

标签: scala apache-spark


root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- webhooks: struct (nullable = false)
 |    |    |-- index: string (nullable = false)
 |    |    |-- failed_at: string (nullable = true)
 |    |    |-- status: string (nullable = true)
 |    |    |-- updated_at: string (nullable = true)

如何通过从列表中获取输入来从(webhooks)中删除该列 例如filterList:List [String] = List(“ index”,“ status”)。有什么办法可以迭代行,就像中间模式不会更改最终模式一样

root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- webhooks: struct (nullable = false)
 |    |    |-- index: string (nullable = false)
 |    |    |-- status: string (nullable = true)

2 个答案:

答案 0 :(得分:1)

检查以下代码。

scala> df.printSchema
root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- webhooks: struct (nullable = true)
 |    |-- index: string (nullable = true)
 |    |-- failed_at: string (nullable = true)
 |    |-- status: string (nullable = true)
 |    |-- updated_at: string (nullable = true)

scala> val actualColumns = df.select(s"webhooks.*").columns

scala> val removeColumns = Seq("index","status")

scala> val webhooks = struct(actualColumns.filter(c => !removeColumns.contains(c)).map(c => col(s"webhooks.${c}")):_*).as("webhooks")

输出

scala> df.withColumn("webhooks",webhooks).printSchema
root
 |-- _id: string (nullable = true)
 |-- h: string (nullable = true)
 |-- inc: string (nullable = true)
 |-- op: string (nullable = true)
 |-- ts: string (nullable = true)
 |-- webhooks: struct (nullable = false)
 |    |-- failed_at: string (nullable = true)
 |    |-- updated_at: string (nullable = true)

答案 1 :(得分:0)

也可以查看https://stackoverflow.com/a/39943812/2204206

在删除深度嵌套的列时可以更方便

相关问题