Spark无法从分区的S3存储桶中读取实木复合地板文件

时间:2019-04-05 20:10:44

标签: scala apache-spark amazon-s3

我的S3存储桶是这样划分的:

bucket
|--2018
|--2019
    |--01
    |--02
       |--01
          |--files.parquet
...

当我使用此命令(Spark 2.1.1)阅读时,效果很好:

val dfo = sqlContext.read.parquet("s3://bucket/2019/04/03/*")

但是当我尝试将分区变量添加到路径时,它会出错:

val dfo = sqlContext.read.parquet("s3://bucket/2019/04/day=03/*")
or
val dfo = sqlContext.read.parquet("s3://bucket/y=2019/m=04/day=03") 

错误:

Name: org.apache.spark.sql.AnalysisException
Message: Path does not exist: s3://bucket/2019/04/day=03/*;
StackTrace:   at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:377)
  at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
  at scala.collection.immutable.List.foreach(List.scala:381)
  at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
  at scala.collection.immutable.List.flatMap(List.scala:344)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:441)
  at org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:425)

0 个答案:

没有答案