我正在阅读有关kafka主题的消息
messageDFRaw = spark.readStream\
.format("kafka")\
.option("kafka.bootstrap.servers", "localhost:9092")\
.option("subscribe", "test-message")\
.load()
messageDF = messageDFRaw.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING) as dict")
当我从上面的查询中打印数据框时,我得到下面的控制台输出。
|key|dict|
|#badbunny |{"channel": "#badbunny", "username": "mgat22", "message": "cool"}|
如何从DataStreamReader创建数据框,使数据框的列为|key|channel| username| message|
我尝试遵循How to read records in JSON format from Kafka using Structured Streaming?
中的可接受答案struct = StructType([
StructField("channel", StringType()),
StructField("username", StringType()),
StructField("message", StringType()),
])
messageDFRaw.select(from_json("CAST(value AS STRING)", struct))
但是,我在Expected type 'StructField', got 'StructType' instead
中得到了from_json()
答案 0 :(得分:0)
我忽略了Expected type 'StructField', got 'StructType' instead
中的警告from_json()
。
但是,我必须首先从kafka消息中转换值,然后在以后解析json模式。
messageDF = messageDFRaw.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
messageParsedDF = messageDF.select(from_json("value", struct_schema).alias("message"))
messageFlattenedDF = messageParsedDF.selectExpr("value.channel", "value.username", "value.message")