Question

我目前正在尝试将数据帧写入临时文件，然后将该临时文件上传到S3存储桶。当我运行我的代码时，目前没有任何动作发生。任何帮助将不胜感激。以下是我的代码：

import csv
import pandas as pd
import boto3
import tempfile
import os 

s3 = boto3.client('s3', aws_access_key_id = access_key, aws_secret_access_key = secret_key, region_name = region)

temp = tempfile.TemporaryFile()
largedf.to_csv(temp, sep = '|')
s3.put_object(temp, Bucket = '[BUCKET NAME]', Key = 'test.txt')
temp.close()

Answer 1

您传递给s3.put_object的文件句柄位于最终位置，当您.read时，它将返回一个空字符串。

>>> df = pd.DataFrame(np.random.randint(10,50, (5,5)))
>>> temp = tempfile.TemporaryFile(mode='w+')
>>> df.to_csv(temp)
>>> temp.read()
''

快速解决方法是.seek回到开头......

>>> temp.seek(0)
0
>>> print(temp.read())
,0,1,2,3,4
0,11,42,40,45,11
1,36,18,45,24,25
2,28,20,12,33,44
3,45,39,14,16,20
4,40,16,22,30,37

注意，写入磁盘是不必要的，实际上，您可以使用缓冲区将所有内容保存在内存中，例如：

from io import StringIO # on python 2, use from cStringIO import StringIO
buffer = StringIO()
pd.to_csv(buffer)
buffer.seek(0)
s3.put_object(buffer, Bucket = '[BUCKET NAME]', Key = 'test.txt')

Python将临时文件写入S3

1 个答案: