我正在尝试使用已定义的Spark模式从Mongo读取数据。虽然我擅长结构性领域,但无法获得如何为数组编写代码的方法。我在任何地方都找不到合适的解决方案。
有人可以帮我解决这个问题吗?
Mongo结构:输入数据
root
|-- _id: struct (nullable = true)
| |-- oid: string (nullable = true)
|-- addresses: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- street: string (nullable = true)
| | |-- city: string (nullable = true)
| | |-- state: string (nullable = true)
| | |-- zip: string (nullable = true)
|-- name: string (nullable = true)
Below schema is not working because of multiple elements are available in address. Because i am not sure how to use ArrayType for array data. Could you please check the below one and suggest where to change?
`schema = StructType([StructField("_id", StructType([
StructField("oid",StringType(),True)
]),True),
StructField("addresses",
StructType([
StructField("element",
StructType([enter code here
StructField("street",StringType(),True),
StructField("city",StringType(),True),
StructField("state",StringType(),True),
StructField("zip",StringType(),True)
]),True) ]) ,True)
StructField("name",StringType(),True)])`