我使用Spark结构化流API从MQTT流源读取数据。
val lines:= spark.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic", "Employee")
.option("username", "username")
.option("password", "passwork")
.option("clientId", "employee11")
.load("tcp://localhost:8000").as[(String, Timestamp)]
我将流数据转换为案例类Employee
case class Employee(Name: String, Department: String)
val ds = lines.map {
row =>
implicit val format = DefaultFormats
parse(row._1).extract[Employee]
}
....some transformations
df.writeStream
.outputMode("append")
.format("es")
.option("es.resource", "spark/employee")
.option("es.nodes", "localhost")
.option("es.port", 9200)
.start()
.awaitTermination()
现在,队列中有些消息的结构与案例类Employee
不同。可以说缺少一些必填列。我的流作业失败,找不到字段。
现在,我将要处理此类异常,并且还希望针对该异常发送警报通知。我尝试放入try / catch块。
case class ErrorMessage(row: String)
catch {
case e: Exception =>
val ds = lines.map {
row =>
implicit val format = DefaultFormats
parse(row._1).extract[ErrorMessage]
}
val error = lines.foreach(row => {
sendErrorMail(row._1)
})
}
}
获得了Exception in thread "main" org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();;
mqtt
的例外
任何帮助,将不胜感激。
答案 0 :(得分:0)
我在catch块中创建了一个foreach接收器,并且能够处理异常并发送邮件警报。
catch {
case e: Exception =>
val foreachWriter = new ForeachWriter[Row] {
override def open(partitionId: Timestamp, version: Timestamp): Boolean = {
true
}
override def process(value: Row): Unit = {
code for sending mail.........
}
override def close(errorOrNull: Throwable): Unit = {}
}
val df = lines.selectExpr("cast (value as string) as json")
df.writeStream
.foreach(foreachWriter)
.outputMode("append")
.start()
.awaitTermination()
}
答案 1 :(得分:0)
我认为您应该使用Spark streaming doc中所述的start()
方法的返回对象。像这样:
val query = df.writeStream. ... .start()
try {
//If the query has terminated with an exception, then the exception will be thrown.
query.awaitTermination()
catch {
case ex: Exception => /*code to send mail*/
}
实施自己的foreach接收器会导致频繁打开和关闭连接的开销。