有什么方法可以将Spark Dataframe中的消息[值列数据]转换为字符串变量?

时间:2019-07-10 20:45:52

标签: apache-spark pyspark apache-kafka pyspark-sql spark-streaming-kafka

我只想从Kafka制片人那里得到第一条消息,然后基于该消息,我将从第一条记录中获取模式,并将该模式​​应用于即将到来的记录。

有什么方法可以获取值列-第一行-单元格数据转换为python字符串?

# I have config spark kafka readStream 
df_stream = spark \
    .readStream \
    .format("kafka") \
    .option("kafka.bootstrap.servers", bootstrap_kafka_server) \
    .option("subscribe", topic) \
    .option("inferSchema", "true") \
    .load()

# since the data - value column in Byte Array I have converted that data into String.

df_stream_value = df_stream.select(df_stream.value.cast("string").alias('value'))\
                            .groupBy("value").count()

query = df_stream_value.writeStream.outputMode("complete").format("console").start()


# query.name()  # get the name of the auto-generated or user-specified name
# query.explain()  # print detailed explanations of the query


if df_stream_value.head().getInt(0) > 0:
    query.stop()

query.awaitTermination()
# def process_row(row):

0 个答案:

没有答案