最近我正在阅读hadoop源代码,当我阅读OutputFormat类时,checkOutputSpecs
方法有这样的文档:
Check for validity of the output-specification for the job.
<p>This is to validate the output specification for the job when it is
a job is submitted. Typically checks that it does not already exist,
throwing an exception when it already exists, so that output is not
overwritten.</p>
我的问题是为什么输出目录不应该存在?
我的猜测是,如果输出目录中有part-****
,则可以避免覆盖旧文件。
还有其他原因吗?