IllegalArgumentException:u“选项'basePath'必须是目录”

时间:2019-03-04 08:08:12

标签: python apache-spark pyspark

我正在尝试使用Pyspark(2.4)和hadoop-aws-2.7.3.jar,aws-java-sdk-1.7.4.jar来读取S3(伪造-localstack)镶木地板分区文件。这些文件按event_year = YYYY / event_month = MM / event_day = DD进行分区,因此我正在使用basePath选项。

paths= ['s3://ubaevents/events/org_pk=2/event_year=2018/event_month=11/','s3://ubaevents/events/org_pk=2/event_year=2018/event_month=12/'] 
base_path = 's3://ubaevents/events/' 
df = spark.read.option(basePath=base_path).parquet(*paths)
  
    
      

df = spark.read.options(basePath = base_path).parquet(* paths)       追溯(最近一次通话):         文件“”,第1行,位于         实木复合地板中的文件“ /Users/amgonen/PycharmProjects/cyber-intel/venv/lib/python2.7/site-packages/pyspark/sql/readwriter.py”,第316行           返回self._df(self._jreader.parquet(_to_seq(self._spark._sc,路径)))         文件“ /Users/amgonen/PycharmProjects/cyber-intel/venv/lib/python2.7/site-packages/pyspark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py”,行1257 ,在致电中         装饰中的文件“ /Users/amgonen/PycharmProjects/cyber-intel/venv/lib/python2.7/site-packages/pyspark/sql/utils.py”,第79行           引发IllegalArgumentException(s.split(':',1)[1],stackTrace)       pyspark.sql.utils.IllegalArgumentException:u“选项'basePath'必须是目录”

    
  

0 个答案:

没有答案