Question

我有一个文件网址列表，它们是下载链接。我编写了Python代码将文件下载到我的计算机上。这就是问题所在，列表中大约有500个文件，在下载了大约50个这些文件后，Chrome变得没有响应。我最初的目标是将我下载的所有文件上传到s3中的Bucket。有没有办法让文件直接进入S3？这是我到目前为止所写的内容：

import requests
from itertools import chain
import webbrowser

url = "<my_url>"
username = "<my_username>"
password = "<my_password>"
headers = {"Content-Type":"application/xml","Accept":"*/*"}

response = requests.get(url, auth=(username, password), headers = headers)
if response.status_code != 200:
    print('Status:', response.status_code, 'Headers:', response.headers, 'Error Response:', response.json())
    exit()

data = response.json()
values = list(chain.from_iterable(data.values()))
links = [lis['download_link'] for lis in values]
for item in links:
    webbrowser.open(item)

Answer 1

使用python3和boto3（AWS SDK）非常简单，例如：

import boto3

s3 = boto3.client('s3')
with open('filename.txt', 'rb') as data:
    s3.upload_fileobj(data, 'bucketname', 'filenameintos3.txt')

有关更多信息，请在此处阅读boto3文档： http://boto3.readthedocs.io/en/latest/guide/s3-example-creating-buckets.html

享受

Answer 2

如果系统上安装了aws cli，则可以使用subprocess库。例如：

import subprocess
def copy_file_to_s3(source: str, target: str, bucket: str):
   subprocess.run(["aws", "s3" , "cp", source, f"s3://{bucket}/{target}"])

类似地，您可以将该逻辑用于各种AWS客户端操作，例如下载或列出文件等。这种方式无需导入Boto3。我想它的用途不是那样的，但实际上，我觉得那样非常方便。这样，您还可以在控制台中显示上载的状态-例如：

Completed 3.5 GiB/3.5 GiB (242.8 MiB/s) with 1 file(s) remaining

要根据您的意愿修改方法，建议您参考subprocess参考和AWS Cli reference。

使用Python将文件上载到S3

2 个答案: