爆炸功能以读取嵌套的JSON文件

时间:2019-05-24 05:16:01

标签: python json apache-spark pyspark

我有一个JSON文件,该文件中嵌套了JSON,并且要读取嵌套的JSON,我想使用pyspark的explode函数。由于我是新手,所以我尝试使用explode而不创建数据框,但无法获取正确的语法d =如何使用爆炸功能,这是正确的方法还是我们必须先创建数据帧,然后才可以使用爆炸功能。我在stackoverflow上读了很少的答案,但无法得到我的答案。谢谢您能简单地向我解释。 预先感谢

1 个答案:

答案 0 :(得分:-1)

您可以通过此代码

source_json = """
{
    "persons": [
        {
            "name": "John",
            "age": 30,
            "cars": [
                {
                    "name": "Ford",
                    "models": [
                        "Fiesta",
                        "Focus",
                        "Mustang"
                    ]
                },
                {
                    "name": "BMW",
                    "models": [
                        "320",
                        "X3",
                        "X5"
                    ]
                }
            ]
        },
        {
            "name": "Peter",
            "age": 46,
            "cars": [
                {
                    "name": "Huyndai",
                    "models": [
                        "i10",
                        "i30"
                    ]
                },
                {
                    "name": "Mercedes",
                    "models": [
                        "E320",
                        "E63 AMG"
                    ]
                }
            ]
        }
    ]
}
"""
from pyspark.sql.functions import explode, col

dbutils.fs.put("/tmp/source.json", source_json, True)

source_df = spark.read.option("multiline", "true").json("/tmp/source.json")

persons = source_df.select(explode("persons").alias("persons"))

persons_cars = persons.select(col("persons.name").alias("persons_name"),col("persons.age").alias("persons_age"),explode("persons.cars").alias("persons_cars_brands"),col("persons_cars_brands.name").alias("persons_cars_brand"))

persons_cars_models = persons_cars.select(col("persons_name"),col("persons_age"),col("persons_cars_brand"),explode("persons_cars_brands.models").alias("persons_cars_model"))

persons_cars_models.show()

+------------+-----------+------------------+------------------+
|persons_name|persons_age|persons_cars_brand|persons_cars_model|
+------------+-----------+------------------+------------------+
|        John|         30|              Ford|            Fiesta|
|        John|         30|              Ford|             Focus|
|        John|         30|              Ford|           Mustang|
|        John|         30|               BMW|               320|
|        John|         30|               BMW|                X3|
|        John|         30|               BMW|                X5|
|       Peter|         46|           Huyndai|               i10|
|       Peter|         46|           Huyndai|               i30|
|       Peter|         46|          Mercedes|              E320|
|       Peter|         46|          Mercedes|           E63 AMG|
+------------+-----------+------------------+------------------+