我正在盯着张量流玩。我正面临以下问题。我正在尝试运行一个示例,以基于Stanford Dog Dataset进行图像识别。我陷入了将图像和标签转换为TRFRECORDS文件的步骤中。 在图像数据集文件夹中,有120个子文件夹,每个品种(标签)一个。 如果我只在子文件夹上运行以下代码,则运行良好(实际上,我没有尝试读取trfrecord文件)。但是,如果我包含第二个子文件夹,则该进程将终止python内核进程。
这是我正在运行的代码
import glob
import tensorflow as tf
from itertools import groupby
from collections import defaultdict
image_filenames = glob.glob(r'C:\Users\Administrator\Documents\Tensorflow\images\n02*\*.jpg')
training_dataset = defaultdict(list)
testing_dataset = defaultdict(list)
# Split up the filename into its breed and corresponding filename. The breed is found by taking the directo
image_filename_with_breed =map(lambda filename: (filename.split("\\")[6], filename), image_filenames)
# Group each image by the breed which is the 0th element in the tuple returned above
for dog_breed, breed_images in groupby(image_filename_with_breed, lambda x: x[0]):
# Enumerate each breed's image and send ~20% of the images to a testing set
for i, breed_image in enumerate(breed_images):
if i % 5 == 0:
testing_dataset[dog_breed].append(breed_image[1])
else:
training_dataset[dog_breed].append(breed_image[1])
# Check that each breed includes at least 18% of the images for testing
breed_training_count = len(training_dataset[dog_breed])
breed_testing_count = len(testing_dataset[dog_breed])
assert round(breed_testing_count / (breed_training_count + breed_testing_count), 2) > 0.18,'Not enough testing data'
sess = tf.Session()
def write_records_file(dataset, record_location):
"""
Fill a TFRecords file with the images found in `dataset` and include their category.
Parameters
----------
dataset : dict(list)
Dictionary with each key being a label for the list of image filenames of its value.
record_location : str
Location to store the TFRecord output.
"""
writer = None
# Enumerating the dataset because the current index is used to breakup the files if they get over 100
# images to avoid a slowdown in writing.
current_index = 0
for breed, images_filenames in dataset.items():
for image_filename in images_filenames:
print(image_filename)
if current_index % 100 == 0:
if writer:
writer.close()
record_filename = "{record_location}-{current_index}.tfrecords".format(
record_location=record_location,
current_index=current_index)
print(record_filename)
writer = tf.python_io.TFRecordWriter(record_filename)
current_index += 1
image_file = tf.read_file(image_filename)
# In ImageNet dogs, there are a few images which TensorFlow doesn't recognize as JPEGs. This
# try/catch will ignore those images.
try:
image = tf.image.decode_jpeg(image_file)
except:
print(image_filename)
continue
# Converting to grayscale saves processing and memory but isn't required.
grayscale_image = tf.image.rgb_to_grayscale(image)
resized_image = tf.image.resize_images(grayscale_image, (250, 151))
# tf.cast is used here because the resized images are floats but haven't been converted into
# image floats where an RGB value is between [0,1).
image_bytes = sess.run(tf.cast(resized_image, tf.uint8)).tobytes()
# Instead of using the label as a string, it'd be more efficient to turn it into either an
# integer index or a one-hot encoded rank one tensor.
# https://en.wikipedia.org/wiki/One-hot
image_label = breed.encode("utf-8")
example = tf.train.Example(features=tf.train.Features(feature={
'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_label])),
'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_bytes]))
}))
writer.write(example.SerializeToString())
writer.close()
write_records_file(testing_dataset, r'C:\Users\Administrator\Documents\Tensorflow\TRF\testing_images')
write_records_file(training_dataset, r'C:\Users\Administrator\Documents\Tensorflow\TRF\training_images')
我监视了内存使用情况,运行脚本似乎并没有消耗太多内存。我在两个虚拟机中尝试过。一个使用Ubuntu,另一个使用Windows 2000。
有人有想法吗? 谢谢!
答案 0 :(得分:0)
我发现了问题。错误标识了 writer.close()语句。我应该在第一个for循环中被标识出来,但在第二个循环中被标识出来。