将熊猫拼花地板写成s3

时间:2020-02-01 11:27:26

标签: python pandas amazon-s3 airflow

如何将按列划分的拼花地板写入s3?我正在尝试:

def write_df_into_s3(df, bucket_name, filepath, format="parquet"):
    buffer = None
    hook = S3Hook()

    if format == "parquet":
        buffer = BytesIO()
        df.to_parquet(buffer, index=False, partition_cols=['date'])
    else:
        raise Exception("Format not implemented!")

    hook.load_bytes(buffer.getvalue(), filepath, bucket_name)

    return f"s3://{bucket_name}/{filepath}"

但是我遇到了一个错误'NoneType' object has no attribute '_isfilestore'

1 个答案:

答案 0 :(得分:0)

对于python 3.6 +,AWS有一个名为aws-data-wrangler的库,可帮助实现Pandas / S3 / Parquet之间的集成

安装do;

pip install awswrangler

如果要将熊猫数据框作为分区的实木复合地板文件写入S3,请执行;

import awswrangler as wr
wr.s3.to_parquet(
    dataframe=df,
    path="s3://my-bucket/key/"
    dataset=True,
    partition_cols=["date"]
)