我在s3存储桶中有很多csv文件,全名如下:
fullname = “s3://mybucket/part-00000-46acaa37-75ba.csv”
我需要逐个阅读文件,因此文件名将使用*
path = “s3://mybucket/*.csv”
如果我使用全名,则以下代码正常工作,但如果使用*,我将错误输入密钥。
import pandas as pd
import io
import boto3
session = boto3.Session()
s3_client = session.client('s3')
bucket= “mybucket”
#key = part-00000-46acaa37-75ba.csv
s3_client.get_object(Bucket=bucket, Key=key)
df = pd.read_csv(io.BytesIO(obj['Body'].read()))
我如何解决这个问题并阅读存储桶中的所有文件? 感谢。
答案 0 :(得分:1)
您应该将代码扩展到:
示例:
import pandas as pd
import io
import boto3
session = boto3.Session()
s3_client = session.client('s3')
bucket = "mybucket"
# Get all CSV files in the bucket
def get_csv_files(client, bucket):
csv_files = []
content = client.list_objects(Bucket=bucket).get('Contents')
for obj in content:
key = obj.get('Key')
if '.csv' in key:
csv_files.append(key)
return csv_files
for key in get_csv_files(s3_client, bucket):
file_body = s3_client.get_object(Bucket=bucket, Key=key).get('Body')
df = pd.read_csv(io.BytesIO(file_body.read()))
# -> Do Something with DataFrame 'df'