如何从计算机加载图像数据集并分为两个数据集进行训练和测试?

时间:2019-02-28 18:54:21

标签: python

图像数据描述:200x200尺寸的2D二进制图像

存在123个标签,每个类(标签)包含10个图像帧,其中我认为剩下的前4个图像将是训练数据集。

data_Path ='C:\ GaitDatasetB-silh_PerfectlyAlingedImages_Active_EnergyImage \'

在该代码中,内置的mnist数据集被加载到我想加载图像数据集进行分类的位置。

我该怎么办?

如何从计算机加载图像数据集并将其分为两个数据集进行训练和测试?如上所述。

python代码:

    import keras
    from keras.datasets import mnist
    from keras.models import Sequential
    from keras.layers import Dense, Dropout, Flatten
    from keras.layers import Conv2D, MaxPooling2D
    import numpy as np

    batch_size = 128
    num_classes = 10
    epochs = 12

    # input image dimensions
    img_rows, img_cols = 28, 28

    # the data, split between train and test sets
    (x_train, y_train), (x_test, y_test) = mnist.load_data() # I want to load data from data_Path='C:\GaitDatasetB-silh_PerfectlyAlingedImages_Active_EnergyImage\'

    x_train = x_train.reshape(60000,28,28,1)
    x_test = x_test.reshape(10000,28,28,1)

    print('x_train shape:', x_train.shape)
    print(x_train.shape[0], 'train samples')
    print(x_test.shape[0], 'test samples')

在该代码中,内置的mnist数据集被加载到我想加载图像数据集进行分类的位置。

我该怎么办?

代码参考:https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f

1 个答案:

答案 0 :(得分:0)

有用于处理图像数据的软件包。 skimage.io.imread返回ndarray,这对于keras非常有效。因此,您可以像这样读取数据:

all_images = []
for image_path in os.listdir(path):
  img = io.imread(image_path , as_grey=True)
  img = img.reshape([WIDTH, HEIGHT, 1])
  all_images.append(img)
x_train = np.array(all_images)

现在您已经准备好训练数据。您还需要制作一系列标签。我称之为y_train。您可以像这样将其转换为一键式:

y_train = keras.utils.to_categorical(y_train, num_classes)

其余一切与MNIST示例相同。

我根据您的建议准备了代码:

    path1='C:\\Data\\For new Paper3\Old\\GaitDatasetB-silh_PerfectlyAlingedImages_EnergyImage\\';
    all_images = []
    subjects = os.listdir(path1)
    numberOfSubject = len(subjects)
    print('Number of Subjects: ', numberOfSubject)
    for number1 in range(0, numberOfSubject):  # numberOfSubject
        path2 = (path1 + subjects[number1] + '/')
        sequences = os.listdir(path2);
        numberOfsequences = len(sequences)
        for number2 in range(4, numberOfsequences):
            path3 = path2 + sequences[number2]
            img = cv2.imread(path3 , 0)
            img = img.reshape(200, 200, 1)
            all_images.append(img)
    x_train = np.array(all_images)

    y_train = keras.utils.to_categorical(y_train, num_classes)

但是最后一行代码反映了一个错误:

y_train = keras.utils.to_categorical(y_train,num_classes) NameError:名称“ y_train”未定义

我应该怎么做才能将标签存储在y_train变量中? 在每秒for循环的运行时间,所有图像的标签应该相同。

遵循我的代码,以便可以将其嵌入到CNN过程中。 https://towardsdatascience.com/build-your-own-convolution-neural-network-in-5-mins-4217c2cf964f