我目前正在使用以下代码从Azure EventHub读取流数据,我将需要应用一些窗口函数,以便读取约5个薄荷糖的记录并对其进行解析。是否有窗口功能可用,因此仅读取5个薄荷糖的记录。
以下是我当前正在运行的代码:
val namespaceName = "XXXXX"
val eventHubName = "XXXXX"
val sasKeyName = "XXXXXX"
val sasKey = "XXXXXX"
val connStr = new com.microsoft.azure.eventhubs.ConnectionStringBuilder().setNamespaceName(namespaceName).setEventHubName(eventHubName).setSasKeyName(sasKeyName).setSasKey(sasKey)
val customEventhubParameters = EventHubsConf(connStr.toString()).setMaxEventsPerTrigger(5)
// Create a stream that reads data from the specified Event Hub.
val incomingStream = spark.readStream.format("eventhubs").options(customEventhubParameters.toMap).load()
val messages = incomingStream.withColumn("Offset", $"offset".cast(LongType)).withColumn("Time (readable)", $"enqueuedTime".cast(TimestampType)).withColumn("Timestamp", $"enqueuedTime".cast(LongType)).withColumn("Body", $"body".cast(StringType)).select("Offset", "Time (readable)", "Timestamp", "Body")
messages.printSchema
messages.printSchema
root
|-- Offset: long (nullable = true)
|-- Time (readable): timestamp (nullable = true)
|-- Timestamp: long (nullable = true)
|-- Body: string (nullable = true)
messages.writeStream.outputMode("append").format("console").option("truncate", false).start()
以上代码将开始连续流式传输,但我只需要5分钟的数据即可进一步解析,是否有窗口功能,因此只能从Azure EventHub读取5分钟的数据。任何帮助将不胜感激。