tensorflow TFRecordWriter杀死内核

时间:2018-07-09 02:30:24

标签: python image tensorflow kernel

我正在盯着张量流玩。我正面临以下问题。我正在尝试运行一个示例,以基于Stanford Dog Dataset进行图像识别。我陷入了将图像和标签转换为TRFRECORDS文件的步骤中。 在图像数据集文件夹中,有120个子文件夹,每个品种(标签)一个。 如果我只在子文件夹上运行以下代码,则运行良好(实际上,我没有尝试读取trfrecord文件)。但是,如果我包含第二个子文件夹,则该进程将终止python内核进程。

这是我正在运行的代码

import glob
import tensorflow as tf
from itertools import groupby
from collections import defaultdict

image_filenames = glob.glob(r'C:\Users\Administrator\Documents\Tensorflow\images\n02*\*.jpg')

training_dataset = defaultdict(list)
testing_dataset = defaultdict(list)

# Split up the filename into its breed and corresponding filename. The breed is found by taking the directo
image_filename_with_breed =map(lambda filename: (filename.split("\\")[6], filename), image_filenames)

# Group each image by the breed which is the 0th element in the tuple returned above
for dog_breed, breed_images in groupby(image_filename_with_breed, lambda x: x[0]):
    # Enumerate each breed's image and send ~20% of the images to a testing set
    for i, breed_image in enumerate(breed_images):
        if i % 5 == 0:
            testing_dataset[dog_breed].append(breed_image[1])
        else:
            training_dataset[dog_breed].append(breed_image[1])

# Check that each breed includes at least 18% of the images for testing
breed_training_count = len(training_dataset[dog_breed])
breed_testing_count = len(testing_dataset[dog_breed])
assert round(breed_testing_count / (breed_training_count + breed_testing_count), 2) > 0.18,'Not enough testing data'

sess = tf.Session()

def write_records_file(dataset, record_location):
    """
    Fill a TFRecords file with the images found in `dataset` and include their category.
    Parameters
    ----------
    dataset : dict(list)
    Dictionary with each key being a label for the list of image filenames of its value.
    record_location : str
    Location to store the TFRecord output.
    """
    writer = None
    # Enumerating the dataset because the current index is used to breakup the files if they get over 100
    # images to avoid a slowdown in writing.
    current_index = 0
    for breed, images_filenames in dataset.items():
        for image_filename in images_filenames:
            print(image_filename)
            if current_index % 100 == 0:
                if writer:
                    writer.close()

                record_filename = "{record_location}-{current_index}.tfrecords".format(
                    record_location=record_location,
                    current_index=current_index)
                print(record_filename)
                writer = tf.python_io.TFRecordWriter(record_filename)
            current_index += 1

            image_file = tf.read_file(image_filename)
        # In ImageNet dogs, there are a few images which TensorFlow doesn't recognize as JPEGs. This
        # try/catch will ignore those images.
        try:
            image = tf.image.decode_jpeg(image_file)
        except:
            print(image_filename)
            continue

        # Converting to grayscale saves processing and memory but isn't required.
        grayscale_image = tf.image.rgb_to_grayscale(image)
        resized_image = tf.image.resize_images(grayscale_image, (250, 151))
        # tf.cast is used here because the resized images are floats but haven't been converted into
        # image floats where an RGB value is between [0,1).
        image_bytes = sess.run(tf.cast(resized_image, tf.uint8)).tobytes()
        # Instead of using the label as a string, it'd be more efficient to turn it into either an
        # integer index or a one-hot encoded rank one tensor.
        # https://en.wikipedia.org/wiki/One-hot
        image_label = breed.encode("utf-8")
        example = tf.train.Example(features=tf.train.Features(feature={
                'label': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_label])),
                'image': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_bytes]))
                }))
        writer.write(example.SerializeToString())

        writer.close()

write_records_file(testing_dataset, r'C:\Users\Administrator\Documents\Tensorflow\TRF\testing_images')
write_records_file(training_dataset, r'C:\Users\Administrator\Documents\Tensorflow\TRF\training_images')

我监视了内存使用情况,运行脚本似乎并没有消耗太多内存。我在两个虚拟机中尝试过。一个使用Ubuntu,另一个使用Windows 2000。

有人有想法吗? 谢谢!

1 个答案:

答案 0 :(得分:0)

我发现了问题。错误标识了 writer.close()语句。我应该在第一个for循环中被标识出来,但在第二个循环中被标识出来。