我想将图像数据集分为train
,test
和validation
。我正在使用tf.data API,但是我不知道如何将分割后的数据集与tf.data一起使用?
from random import shuffle
import glob
import cv2
import numpy as np
import sys
shuffle_data = True # shuffle the addresses before saving
data_root = 'I:/pattern/Data2/train'
# read addresses and labels from the 'train' folder
addrs = glob.glob(data_root)
# to shuffle data
if shuffle_data:
c = list(zip(addrs,label_to_index ))
shuffle(c)
addrs, label_to_index = zip(*c)
# Divide the hata into 60% train, 20% validation, and 20% test
train_addres = addrs[0:int(0.6 * len(addrs))]
train_labels = label_to_index[0:int(0.6 * len(label_to_index))]
val_addrs = addrs[int(0.6 * len(addrs)):int(0.8 * len(addrs))]
val_labels = label_to_index[int(0.6 * len(addrs)):int(0.8 * len(addrs))]
test_addrs = addrs[int(0.8 * len(addrs)):]
test_labels = label_to_index[int(0.8 * len(label_to_index)):]
我使用此链接中的代码来构建tf.data数据集: https://www.tensorflow.org/tutorials/load_data/images