我正在尝试将Pandas数据框写入分区文件,直接将其上传到Datalake(Gen2),而不在本地系统中上传。 我成功使用缓冲区上传了一个实木复合地板文件:
block_blob_service = BlockBlobService(account_name= name, account_key=secret)
buffer = BytesIO()
data.to_parquet(buffer)
block_blob_service.create_blob_from_bytes(container_name="container", blob_name="path/example.parquet", blob=buffer.getvalue())
我尝试将partition_cols
添加到.to_parquet()
并将blob_name
放置为根目录,例如波纹管:
block_blob_service = BlockBlobService(account_name= name, account_key=secret)
buffer = BytesIO()
data.to_parquet(buffer, partition_cols=["Year", "Month", "Day"])
block_blob_service.create_blob_from_bytes(container_name="container", blob_name="path/", blob=buffer.getvalue())
但是我有错误:
AttributeError: 'NoneType' object has no attribute '_isfilestore'
我也尝试使用pyarrow.parquet.write_to_dataset()
,但似乎当前不提供对AzureDLFileSystem(ADL)的支持...
如何将分区文件直接写入Azure Datalake?