我正在尝试重命名结构类型数组中的字段。以下是我拥有的架构和我想要的架构。仅提供模式的一部分,它具有n个其他列。
Input schema
|-- n other columns
|-- state_playback_segmentInfo: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- isAd: boolean (nullable = true)
| | |-- queryParameters: string (nullable = true)
| | |-- sequenceNumber: integer (nullable = true)
| | |-- segmentUrl: string (nullable = true)
| | |-- sizeBytes: integer (nullable = true)
| | |-- downloadDurationMs: integer (nullable = true)
| | |-- ipAddress: string (nullable = true)
| | |-- location: string (nullable = true)
Output Schema
|-- n other columns
|-- state__playback__segmentInfo: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- state__playback__segmentInfo__isAd: boolean (nullable = true)
| | |-- state__playback__segmentInfo__queryParameters: string (nullable = true)
| | |-- state__playback__segmentInfo__sequenceNumber: integer (nullable = true)
| | |-- state__playback__segmentInfo__segmentUrl: string (nullable = true)
| | |-- state__playback__segmentInfo__sizeBytes: integer (nullable = true)
| | |-- state__playback__segmentInfo__downloadDurationMs: integer (nullable = true)
| | |-- state__playback__segmentInfo__ipAddress: string (nullable = true)
| | |-- state__playback__segmentInfo__location: string (nullable = true)
我已经创建了用于平整StructType字段的嵌套DF的函数,请查看下面的代码。
def flattenDF(schema: StructType, delimeter:String, prefix: String): Array[Column] = {
schema.fields.flatMap(structField => {
val codeColName = if (prefix == null) structField.name else prefix + "." + structField.name
val colName = if (prefix == null) structField.name else prefix + delimeter + structField.name
structField.dataType match {
case st: StructType => flattenDF(schema = st, delimeter = delimeter, prefix = colName)
case _ => Array(col(codeColName).alias(colName))
}
})
}
帮助处理这种情况或推荐任何参考。