Question

我正在尝试为我的Caffe机器学习项目创建一个LMDB数据库。但LMDB首次尝试插入数据点时会抛出错误，说环境贴图已满。

这是尝试填充数据库的代码：

import numpy as np
from PIL import Image
import os
import lmdb
import random
# my data structure for holding image/label pairs
from serialization import DataPoint

class LoadImages(object):
    def __init__(self, image_data_path):
        self.image_data_path = image_data_path
        self.dirlist = os.listdir(image_data_path)

        # find the number of images that are to be read from disk
        # in this case there are 370 images.
        num = len(self.dirlist)

        # shuffle the list of image files so that they are read in a random order
        random.shuffle(self.dirlist)

        map_size = num*10

        j=0

        # load images from disk
        for image_filename in os.listdir(image_data_path):
            # check that every image belongs to either category _D_ or _P_
            assert (image_filename[:3] == '_D_' or image_filename[:3] == '_P_'), "ERROR: unknown category"

            # set up the LMDB datbase object
            env = lmdb.open('image_lmdb', map_size=map_size)
            with env.begin(write=True) as txn:

                # iterate over (shuffled) list of image files
                for image_filename in self.dirlist:
                    print "Loading " + str(j) + "th image from disk - percentage complete:  " + str((float(j)/num) * 100) + " %"

                    # open the image
                    with open(str(image_data_path + "/" + image_filename), 'rb') as f:
                        image = Image.open(f)
                        npimage = np.asarray(image, dtype=np.float64)

                    # discard alpha channel, if necessary
                    if npimage.shape[2] == 4:
                        npimage = npimage[:,:,:3]
                        print image_filename + " had its alpha channel removed."

                    # get category
                    if image_filename[:3] == '_D_':
                        category = 0
                    elif image_filename[:3] == '_P_':
                        category = 1

                    # wrap image data and label into a serializable data structure
                    datapoint = DataPoint(npimage, category)
                    serialized_datapoint = datapoint.serialize()

                    # a database key
                    str_id = '{:08}'.format(j)

                    # put the data point in the LMDB
                    txn.put(str_id.encode('ascii'), serialized_datapoint)

                j+=1

我还制作了一个小数据结构来保存图像和标签并将它们序列化，如上所述：

import numpy as np

class DataPoint(object):
    def __init__(self, image=None, label=None, dtype=np.float64):
        self.image = image
        if self.image is not None:
            self.image = self.image.astype(dtype)
        self.label = label

    def serialize(self):
        image_string = self.image.tobytes()
        label_string = chr(self.label)
        datum_string = label_string + image_string
        return datum_string

    def deserialize(self, string):
        image_string = string[1:]
        label_string = string[:1]
        image = np.fromstring(image_string, dtype=np.float64)
        label = ord(label_string)
        return DataPoint(image, label)

这是错误：

/usr/bin/python2.7 /home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py
Loading 0th image from disk - percentage complete:  0.0 %
Traceback (most recent call last):
  File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 69, in <module>
    g = LoadImages(path)
  File "/home/hal9000/PycharmProjects/Caffe_Experiments_0.6/gather_images.py", line 62, in __init__
    txn.put(str_id.encode('ascii'), serialized_datapoint)
lmdb.MapFullError: mdb_put: MDB_MAP_FULL: Environment mapsize limit reached

Answer 1

地图大小是整个数据库的最大大小，包括元数据 - 它显示您使用了预期记录的数量。

你增加这个数字

Answer 2

每张图片只有10个字节吗？

除数据库中的图像外，还有其他信息。因此，为LMDB数据库保留更多空间。例如，此命令为磁盘驱动器上的LMDB保留1GB（10 ** 9字节）：

env = lmdb.open('image_lmdb', map_size=int(1e9))

我错误地使用LMDB吗？它表示在0次插入后达到的环境贴图限制

2 个答案: