我只想从Kafka制片人那里得到第一条消息,然后基于该消息,我将从第一条记录中获取模式,并将该模式应用于即将到来的记录。
有什么方法可以获取值列-第一行-单元格数据转换为python字符串?
# I have config spark kafka readStream
df_stream = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", bootstrap_kafka_server) \
.option("subscribe", topic) \
.option("inferSchema", "true") \
.load()
# since the data - value column in Byte Array I have converted that data into String.
df_stream_value = df_stream.select(df_stream.value.cast("string").alias('value'))\
.groupBy("value").count()
query = df_stream_value.writeStream.outputMode("complete").format("console").start()
# query.name() # get the name of the auto-generated or user-specified name
# query.explain() # print detailed explanations of the query
if df_stream_value.head().getInt(0) > 0:
query.stop()
query.awaitTermination()
# def process_row(row):