在加载空的orc文件夹时。无论如何要绕过这个。
val df = spark.read.format("orc").load(orcFolderPath)
org.apache.spark.sql.AnalysisException: Unable to infer schema for ORC. It must be specified manually.;
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$7.apply(DataSource.scala:185)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema(DataSource.scala:184)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:373)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:223)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:211)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
... 49 elided
遇到此错误可能是Orc Reader试图推断模式,但是当存储库空白文件夹中出现某种情况但必须检查时,我想绕过这种特殊情况。
try {
spark.read.format("orc").load(path)
} catch {
case ex: org.apache.spark.sql.AnalysisException => {
null
}
}
尝试通过这种方式捕获异常。任何其他方式都会有帮助
答案 0 :(得分:0)
又有一个解决方案了……这也不是最好的解决方案...
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.{FileSystem, Path}
def pathStatus(path: String): Boolean = {
val config: Configuration = new Configuration()
val fs: FileSystem = FileSystem.get(config)
if (fs.globStatus(new Path(path)) == null) {
false
} else {
true
}
}