Question

来自Kafka-console-producer的JSON字节数据流 PySpark-具有解析器json数据到dataframe。

我试图通过使用给定的模式来解析此json。但是它给我一个有关“ AssertionError：keyType应该为DataType”的错误我该怎么做才能用自定义模式解析json？

schema = StructType()\
    .add("contact_id", LongType())\
    .add("first_name", StringType())\
    .add("last_name", StringType())\
    .add("contact_number", MapType(StringType,
                                   StructType()
                                   .add("home", LongType())
                                   .add("contry_code", StringType())))

期望这种格式的JSON数据： {“ contact_id”：“ 23”，“ first_name”：“ John”，“ last_name”：“ Doe”，“ contact_number”：{“ home”：4564564567，“ country_code”：“ + 1”}}

Answer 1

I have found the solution. This should be the correct schema definition.

schema = StructType([
    StructField('contactId', LongType(), True),
    StructField('firstName', StringType(), True),
    StructField('lastName', StringType(), True),
    StructField("contactNumber", ArrayType(
        StructType([
                StructField("type", StringType(), True),
                StructField("number", LongType(), True),
                StructField("countryCode", StringType(), True)
            ])
        ), True)
    ])

如何修复“ AssertionError：keyType应该为DataType”错误

1 个答案: