Question

我正在尝试将Pandas数据框写入分区文件，直接将其上传到Datalake（Gen2），而不在本地系统中上传。我成功使用缓冲区上传了一个实木复合地板文件：

block_blob_service = BlockBlobService(account_name= name, account_key=secret)
buffer = BytesIO()
data.to_parquet(buffer)
block_blob_service.create_blob_from_bytes(container_name="container", blob_name="path/example.parquet", blob=buffer.getvalue())

我尝试将partition_cols添加到.to_parquet()并将blob_name放置为根目录，例如波纹管：

block_blob_service = BlockBlobService(account_name= name, account_key=secret)
buffer = BytesIO()
data.to_parquet(buffer, partition_cols=["Year", "Month", "Day"])
block_blob_service.create_blob_from_bytes(container_name="container", blob_name="path/", blob=buffer.getvalue())

但是我有错误：

AttributeError: 'NoneType' object has no attribute '_isfilestore'

我也尝试使用pyarrow.parquet.write_to_dataset()，但似乎当前不提供对AzureDLFileSystem（ADL）的支持...

如何将分区文件直接写入Azure Datalake？

如何在python中的Azure Datalake Gen2中从熊猫数据帧写入分区的Parquet文件

0 个答案: