我是scala / spark的新手。我需要编写一个spark作业,根据输入的urls.txt文件调用API。以下是我的示例代码。我认为以下的Await限制了这项工作,但由于我的经验有限,我无法想出更好地实现目标的方法。任何帮助都非常感谢。
object MyApp extends App {
val partitions = 10
val textFile = sc.textFile(s"file:///tmp/urls.txt",partitions)
val futures = textFile.flatMap(url => {
val wsclient = InitializeConfigurations.getWSClient()
var future = wsclient.url(url).withRequestTimeout(10 seconds).get().map(response => {
s"${response.body}"
})
future onComplete {
case Success(res) =>
println(s"oncomplete: res = $res")
case Failure(ex) =>
ex match {
case t: TimeoutException => None
case _ =>
ex.printStackTrace()
}
}
Some(future)
})
val hresps = futures.flatMap(f => {
try {
val line = Await.result(f, 10 seconds)
} catch {
case e: Exception => {
None
}
}
})
hresps.saveAsTextFile(s"file:///tmp/a01.txt")