Question

我正在使用python的jupyter笔记本中工作。我正在尝试读取aws s3存储桶中某个文件夹中的所有实木复合地板文件，并将它们另存为jsons到jupyter目录中的某个文件夹中。我有以下代码，但我相信它只是阅读它们，我想将它们另存为json。谢谢！

bucketname = 'my-bucket'
bucket = response.Bucket(bucketname)
for obj in bucket.objects.all():
    key = obj.key
    body = obj.get()['Body'].read()

Answer 1

如果我正确理解了您的问题，则希望将文件下载到文件系统中，而不是加载到内存中。这是完成此任务的示例代码片段。

bucketname = 'my-bucket'
bucket = response.Bucket(bucketname)
for obj in bucket.objects.all():
    obj.Object().download_file('<specify-the-local-filename>')

您可以找到文档here。

Answer 2

parquet点模块将执行以下操作：https://pypi.org/project/parquet/。他们也有一个例子，复制到这里供快速参考：

import parquet
import json

## assuming parquet file with two rows and three columns:
## foo bar baz
## 1   2   3
## 4   5   6

with open("test.parquet") as fo:
   # prints:
   # {"foo": 1, "bar": 2}
   # {"foo": 4, "bar": 5}
   for row in parquet.DictReader(fo, columns=['foo', 'bar']):
       print(json.dumps(row))

如何从AWS S3存储桶中读取实木复合地板文件并将其另存为Jupyter中的JSON

2 个答案: