Question

我有一个Spark Streaming作业，可以对传入的Kafka Stream进行一些聚合，并将结果保存在Hive中。但是，我有大约5个Spark SQL要在传入数据上运行，它们可以同时运行，因为这些变换之间没有依赖关系，如果可能的话，我想以并发方式运行它们而不等待第一个SQL结束他们都去单独的Hive表。例如：

curl -XPUT 'http://localhost:9200/example1/example2/_bulk' -d'¬
{ "delete" : { "_id" : "1" } } ¬
{ "create" : { "_id" : "1" } } ¬
{ "name" : "John's House" }¬ '

....等等

在执行上述操作时，Future中的操作将按顺序显示（来自Spark url界面）。如何强制执行并发执行？使用Spark 2.0，Scala 2.11.8。我是否需要使用// This is the Kafka inbound stream // Code in Consumer val stream = KafkaUtils.createDirectStream[..](...) val metric1= Future { computeFuture(stream, dataframe1, countIndex) } val metric2= Future { computeFuture(stream, dataframe2, countIndex) } val metric3= Future { computeFirstFuture(stream, dataframe3, countIndex) } val metric4= Future { computeFirstFuture(stream, dataframe4, countIndex) } metric1.onFailure { case e => logger.error(s"Future failed with an .... exception", e) } metric2.onFailure { case e => logger.error(s"Future failed with an .... exception", e) }创建单独的火花会话？

Spark Streaming中的并发执行

0 个答案: