我正在尝试从一个numpy数组文件夹创建一个tfrecord,该文件夹包含大约2000个numpy文件,每个文件大小为50mb。
def convert(image_paths,out_path):
# Args:
# image_paths List of file-paths for the images.
# labels Class-labels for the images.
# out_path File-path for the TFRecords output file.
print("Converting: " + out_path)
# Number of images. Used when printing the progress.
num_images = len(image_paths)
# Open a TFRecordWriter for the output-file.
with tf.python_io.TFRecordWriter(out_path) as writer:
# Iterate over all the image-paths and class-labels.
for i, (path) in enumerate(image_paths):
# Print the percentage-progress.
print_progress(count=i, total=num_images-1)
# Load the image-file using matplotlib's imread function.
img = np.load(path)
# Convert the image to raw bytes.
img_bytes = img.tostring()
# Create a dict with the data we want to save in the
# TFRecords file. You can add more relevant data here.
data = \
{
'image': wrap_bytes(img_bytes)
}
# Wrap the data as TensorFlow Features.
feature = tf.train.Features(feature=data)
# Wrap again as a TensorFlow Example.
example = tf.train.Example(features=feature)
# Serialize the data.
serialized = example.SerializeToString()
# Write the serialized data to the TFRecords file.
writer.write(serialized)
我认为它可以转换大约200个文件,然后我得到它
Converting: tf.recordtrain
- Progress: 3.6%Traceback (most recent call last):
File "tf_record.py", line 71, in <module>
out_path=path_tfrecords_train)
File "tf_record.py", line 54, in convert
writer.write(serialized)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/lib/io/tf_record.py", line 236, in write
self._writer.WriteRecord(record, status)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/errors_impl.py", line 528, in __exit__
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.OutOfRangeError: tf.recordtrain; File too large
任何解决此问题的建议都会有所帮助,谢谢。
答案 0 :(得分:0)
我不确定对tfrecords的限制是什么,但是假设您有足够的磁盘空间,更常见的方法是将数据集存储在多个tfrecords文件中,例如将每20个numpy文件存储在另一个tfrecords文件中。