Question

我希望在不写入磁盘的情况下将文件夹的内容从ftp服务器传输到s3中的存储桶。目前，s3获取文件夹中所有文件的名称，但没有实际数据。文件夹中的每个文件只有几个字节。我不太清楚为什么不上传整个文件。

from ftplib import FTP
import io 
import boto3


s3= boto3.resource('s3')

ftp = FTP('ftp.ncbi.nlm.nih.gov')
ftp.login()
ftp.cwd('pubchem/RDF/descriptor/compound')

address =  'ftp.ncbi.nlm.nih.gov/pubchem/RDF/descriptor/compound/'

filelist = ftp.nlst()

for x in range(0, len(filelist)-1):
    myfile = io.BytesIO()
    filename = 'RETR ' + filelist[x]
    resp = ftp.retrbinary(filename, myfile.write)
    myfile.seek(0)
    path = address + filelist[x]
    #putting file on s3
    s3.Object(s3bucketname, path).put(Body = resp)


ftp.quit()

有没有办法确保整个文件上传？

Answer 1

我们可以使用Python通过流将数据从FTP服务器传输到S3。数据不会在AWS Lambda的/ tmp位置下载。它将直接将数据从FTP流传输到S3存储桶。

from ftplib import FTP
import s3fs

def lambda_handler(event, context):
    file_name = "test.txt" #file name in ftp
    s3 = s3fs.S3FileSystem(anon=False)
    ftp_path = "<ftp_path>"
    s3_path = "s3-dev" #S3 bucket name

with FTP("<ftp_server>") as ftp:
    ftp.login()
    ftp.cwd(ftp_path)
    ftp.retrbinary('RETR ' + file_name, s3.open("{}/{}".format(s3_path, file_name), 'wb').write)

Answer 2

我遇到了同样的问题并且在我更改.put（）以从保存位置读取实际文件时让它工作。所以这样你不处理{{1}直接：

resp

麻烦使用Python通过流将数据从FTP服务器传输到S3

2 个答案: