我添加了秒writeStream(接收器):
scala
case class MyWriter1() extends ForeachWriter[Row]{
override def open(partitionId: Long, version: Long): Boolean = true
override def process(value: Row): Unit = {
println(s"custom1 - ${value.get(0)}")
}
override def close(errorOrNull: Throwable): Unit = true
}
case class MyWriter2() extends ForeachWriter[(String, Int)]{
override def open(partitionId: Long, version: Long): Boolean = true
override def process(value: (String, Int)): Unit = {
println(s"custom2 - $value")
}
override def close(errorOrNull: Throwable): Unit = true
}
object Main extends Serializable{
def main(args: Array[String]): Unit = {
println("starting")
Logger.getLogger("org").setLevel(Level.OFF)
Logger.getLogger("akka").setLevel(Level.OFF)
val host = "localhost"
val port = "9999"
val spark = SparkSession
.builder
.master("local[*]")
.appName("app-test")
.getOrCreate()
import spark.implicits._
// Create DataFrame representing the stream of input lines from connection to host:port
val lines = spark.readStream
.format("socket")
.option("host", host)
.option("port", port)
.load()
// Split the lines into words
val words = lines.as[String].flatMap(_.split(" "))
// Generate running word count
val wordCounts = words.groupBy("value").count()
// Start running the query that prints the running counts to the console
val query1 = wordCounts.writeStream
.outputMode("update")
.foreach(MyWriter1())
.start()
val ds = wordCounts.map(x => (x.getAs[String]("value"), x.getAs[Int]("count")))
val query2 = ds.writeStream
.outputMode("update")
.foreach(MyWriter2())
.start()
spark.streams.awaitAnyTermination()
}
}
不幸的是,只有第一个查询运行,第二个从不运行(MyWriter2从未被调用)
请告知我做错了什么。根据doc:你可以在一个SparkSession中启动任意数量的查询。它们将同时运行,共享集群资源。
答案 0 :(得分:1)
您使用nc -lk 9999
将数据发送到火花吗?每个查询都创建与nc
的连接,但nc
只能将数据发送到第一个连接(查询),您可以编写tcp服务器而不是nc
答案 1 :(得分:1)
我遇到了同样的情况(但是在较新的结构化流式api上),在我的例子中,它帮助在最后一个streamingQuery上调用awaitTermination()。
s.th。像:
query1.start()
query2.start().awaitTermination()
<强>更新强> 相反,上面这个内置解决方案/方法更好:
sparkSession.streams.awaitAnyTermination()
答案 2 :(得分:0)
您使用.awaitAnyTermination()
将在第一个流返回时终止应用程序,您必须等待两个流完成才能终止。
这样的事情应该可以解决问题:
query1.awaitTermination()
query2.awaitTermination()