我在scalding
groupAll
个文档中读到了
/**
* Group all tuples down to one reducer.
* (due to cascading limitation).
* This is probably only useful just before setting a tail such as Database
* tail, so that only one reducer talks to the DB. Kind of a hack.
*/
def groupAll: Pipe = groupAll { _.pass }
这让我有充分的理由相信,如果我pipe
我的结束write
导致statusUpdater
管道刚刚更新某个数据库,我的工作成功完成,那么它将被执行一次工作完成后,我在
以下代码示例:
import Dsl._
somepipe
.addCount
.toPipe(outputSchema)
.write(Tsv(outputPath, outputSchema, writeHeader = true))(flowDef, mode)
.groupAll.updateResultStatus
implicit class StatusResultsUpdater(pipe: Pipe) {
def updateResultStatus: Pipe = {
println("DO THIS ONCE AFTER JOB COMPLETES!") // was printed even before the job ended! how to have it print only when job ends!?
pipe
}
}
根据我使用groupAll
的文档,然后updateResultStatus
应该只在作业结束后运行一次,为什么我会看到它在作业结束前已经打印了语句?我错过了什么吗?我该怎么办呢?
答案 0 :(得分:4)
Scalding作业中的执行顺序有点棘手:
根据您的代码,println
语句将在步骤1执行。