从Mongo模式到Apache Spark的数据转换

时间:2019-09-13 14:05:25

标签: mongodb apache-spark

我正在尝试使用已定义的Spark模式从Mongo读取数据。虽然我擅长结构性领域,但无法获得如何为数组编写代码的方法。我在任何地方都找不到合适的解决方案。

有人可以帮我解决这个问题吗?

Mongo结构:输入数据

root
 |-- _id: struct (nullable = true)
 |    |-- oid: string (nullable = true)
 |-- addresses: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- street: string (nullable = true)
 |    |    |-- city: string (nullable = true)
 |    |    |-- state: string (nullable = true)
 |    |    |-- zip: string (nullable = true)
 |-- name: string (nullable = true)



Below schema is not working because of multiple elements are available in address. Because i am not sure how to use ArrayType for array data. Could you please check the below one and suggest where to change?

`schema = StructType([StructField("_id", StructType([
   StructField("oid",StringType(),True)
      ]),True),
   StructField("addresses", 
   StructType([
   StructField("element", 
   StructType([enter code here
   StructField("street",StringType(),True),
   StructField("city",StringType(),True),
   StructField("state",StringType(),True),
   StructField("zip",StringType(),True)                                     
    ]),True) ]) ,True)
   StructField("name",StringType(),True)])`

0 个答案:

没有答案