是否有一个通用方法来更改任何指定StructType的所有元素的可空属性?它可能是嵌套的StructType。
我看到@eliasah用Spark Dataframe column nullable property change标记为重复。但它们是不同的,因为它无法解决层次结构/嵌套的StructType,该答案仅适用于一个级别。
例如:
root
|-- user_id: string (nullable = false)
|-- name: string (nullable = false)
|-- system_process: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- timestamp: long (nullable = false)
| | |-- process: string (nullable = false)
|-- type: string (nullable = false)
|-- user_process: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- timestamp: long (nullable = false)
| | |-- process: string (nullable = false)
我想将nullalbe更改为true,所有元素的结果应为:
root
|-- user_id: string (nullable = true)
|-- name: string (nullable = true)
|-- system_process: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- timestamp: long (nullable = true)
| | |-- process: string (nullable = true)
|-- type: string (nullable = true)
|-- user_process: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- timestamp: long (nullable = true)
| | |-- process: string (nullable = true)
附件是StructType的JSON模式样本,用于方便测试:
val jsonSchema="""{"type":"struct","fields":[{"name":"user_id","type":"string","nullable":false,"metadata":{}},{"name":"name","type":"string","nullable":false,"metadata":{}},{"name":"system_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}},{"name":"type","type":"string","nullable":false,"metadata":{}},{"name":"user_process","type":{"type":"array","elementType":{"type":"struct","fields":[{"name":"timestamp","type":"long","nullable":false,"metadata":{}},{"name":"process_id","type":"string","nullable":false,"metadata":{}}]},"containsNull":false},"nullable":false,"metadata":{}}]}"""
DataType.fromJson(jsonSchema).asInstanceOf[StructType].printTreeString()
答案 0 :(得分:1)
最后找出两个解决方案如下:
首先尝试替换字符串,然后从JSON字符串
创建StructType实例DataType.fromJson(schema.json.replaceAll("\"nullable\":false", "\"nullable\":true")).asInstanceOf[StructType]
反复出现的方法
def updateFieldsToNullable(structType: StructType): StructType = {
StructType(structType.map(f => f.dataType match {
case d: ArrayType =>
val element = d.elementType match {
case s: StructType => updateFieldsToNullable(s)
case _ => d.elementType
}
f.copy(nullable = true, dataType = ArrayType(element, d.containsNull))
case s: StructType => f.copy(nullable = true, dataType = updateFieldsToNullable(s))
case _ => f.copy(nullable = true)
})
)
}