如何将S3存储桶中的映像文件直接读入内存?

时间:2017-05-18 08:53:34

标签: python matplotlib amazon-s3 boto3

我有以下代码

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
object.download_file('B01.jp2')
img=mpimg.imread('B01.jp2')
imgplot = plt.imshow(img)
plt.show(imgplot)

它有效。但它首先将文件下载到当前目录的问题。是否可以直接在RAM中读取文件并将其解码为图像?

7 个答案:

答案 0 :(得分:18)

我建议使用io module将文件直接读入内存,而不必使用临时文件。

例如:

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import io

s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')

file_stream = io.StringIO()
object.download_fileobj(file_stream)
img = mpimg.imread(file_stream)
# whatever you need to do

如果您的数据是二进制文件,也可以使用io.BytesIO

答案 1 :(得分:11)

Greg Merritt的回答是更好的方法。

我想建议在tempfile模块中使用Python NamedTemporaryFile。它会创建临时文件,在文件关闭时将被删除(感谢@NoamG)

import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
import boto3
import tempfile

s3 = boto3.resource('s3', region_name='us-east-2')
bucket = s3.Bucket('sentinel-s2-l1c')
object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
tmp = tempfile.NamedTemporaryFile()

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    img=mpimg.imread(tmp.name)
    # ...Do jobs using img

答案 2 :(得分:6)

通过在imread()中指定文件格式,可以流式传输图像。

import boto3
from io import BytesIO
import matplotlib.image as mpimg
import matplotlib.pyplot as plt

resource = boto3.resource('s3', region_name='us-east-2')
bucket = resource.Bucket('sentinel-s2-l1c')

image_object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
image = mpimg.imread(BytesIO(image_object.get()['Body'].read()), 'jp2')

plt.figure(0)
plt.imshow(image)

答案 3 :(得分:2)

使用客户端的方法略有不同:

import boto3
import io
from matplotlib import pyplot as plt

client = boto3.client("s3")

bucket='my_bucket'
key= 'my_key'

outfile = io.BytesIO()
client.download_fileobj(bucket, key, outfile)
outfile.seek(0)
img = plt.imread(outfile)

plt.imshow(img)
plt.show()

答案 4 :(得分:0)

object = bucket.Object('tiles/10/S/DG/2015/12/7/0/B01.jp2')
img_data = object.get().get('Body').read()

答案 5 :(得分:0)

根据格雷格·梅里特(Greg Merritt)的答案进行的进一步开发,以解决注释部分中的所有错误,使用BytesIO代替StringIO,使用PIL Image代替matplotlib.image

以下功能适用于python3boto3。同样,write_image_to_s3函数也是一个奖励。

from PIL import Image
from io import BytesIO
import numpy as np

def read_image_from_s3(bucket, key, region_name='ap-southeast-1'):
    """Load image file from s3.

    Parameters
    ----------
    bucket: string
        Bucket name
    key : string
        Path in s3

    Returns
    -------
    np array
        Image array
    """
    s3 = boto3.resource('s3', region_name='ap-southeast-1')
    bucket = s3.Bucket(bucket)
    object = bucket.Object(key)
    response = object.get()
    file_stream = response['Body']
    im = Image.open(file_stream)
    return np.array(im)

def write_image_to_s3(img_array, bucket, key, region_name='ap-southeast-1'):
    """Write an image array into S3 bucket

    Parameters
    ----------
    bucket: string
        Bucket name
    key : string
        Path in s3

    Returns
    -------
    None
    """
    s3 = boto3.resource('s3', region_name)
    bucket = s3.Bucket(bucket)
    object = bucket.Object(key)
    file_stream = BytesIO()
    im = Image.fromarray(img_array)
    im.save(file_stream, format='jpeg')
    object.put(Body=file_stream.getvalue())

答案 6 :(得分:0)

Hyeungshik Jung的临时文件解决方案看起来不错,但我注意到该文件似乎以某种懒惰的方式下载。这会导致以下行为:如果您调用img.shape(),即使调用了(),也将得到一个空的维度元组作为返回值object.download_fileobj(f)。我通过将f.seek(0,2)应用于文件描述符解决了此问题-然后所有以下操作均正常运行,例如返回所有适当的尺寸(704, 1024)

...
tmp = tempfile.NamedTemporaryFile()

with open(tmp.name, 'wb') as f:
    object.download_fileobj(f)
    f.seek(0,2) 
    img=mpimg.imread(tmp.name)
    print (img.shape)