标签: pyspark
数据根据S3上的日期存储在不同的文件夹中,并且每个文件夹中都有分区的镶木地板文件,例如
part-00001-xxxxx-xxxxx-xxxxx-xxxxx.snappy.parquet part-00002-xxxxx-xxxxx-xxxxx-xxxxx.snappy.parquet
....
part-00030-xxxxx-xxxxx-xxxxx-xxxxx.snappy.parquet
如何将这些数据读取到pyspark数据框中?