并行运行SparkSQL阶段作业

时间:2016-10-06 14:24:25

标签: scala apache-spark apache-spark-sql

我正在加载一个文本文件:

val adReqRDD = sc.textFile("/Users/itru/Desktop/vastrack_sample_old.rtf")

我将数据存储为temptable

adReqRDD.registerTempTable("adreqdata")

我需要查询上表

val alladreq = sqlContext.sql("select DeviceId,count(EventType) as AllAdreqCount from adreqdata where EventType = 1 and Network = 0 group by DeviceId ")

val adreqPerDeviceid = sqlContext.sql("select DeviceId,count(EventType) as AdreqCount from adreqdata where EventType = 1 and Network = 0 and PlacementId <> '-' and BundleID <> '-' and DeviceId <> '-' and IPAddress <> '-' group by DeviceId ")

val adreqPerDeviceidtoSpotx = sqlContext.sql("select DeviceId,count(EventType) as AdreqCountToSpotx from adreqdata where EventType = 1 and Network = 9 and PlacementId <> '-' and BundleID <> '-' and DeviceId <> '-' and IPAddress <> '-' group by DeviceId ")

一旦我的工作开始,所有3个活动阶段都按顺序运行,我怎样才能让它们并行运行。

1 个答案:

答案 0 :(得分:0)

您可以使用期货同时启动点火操作。像这样的东西。

val queries = Seq(
  "query1",
  "query2",
  "query3"
)

val results = Future.traverse(queries)(q => Future({
  val queryResult = sqlContext.sql(q)
  queryResult.write.format...
}))

Await.result(result, Duration.Inf)