我实际上是TensorFlow和ML的新手,我正在尝试从pickle文件中加载数据集。我的数据集是2个列表的列表。第一个列表是10000个图像,每个图像由3072个字节的数组表示。每种颜色1024(rgb)。另一个列表是10 000布尔值。我像这样加载我的数据集:
X, Y = pickle.load(open('training_dataset.pkl', 'rb'))
然后我使用以下代码创建我的网络:
network = input_data(shape=[None, 32, 32, 3])
获得ValueError: Cannot feed value of shape (96, 3072) for Tensor 'InputData/X:0', which has shape '(?, 32, 32, 3)'
如何将数据集重塑为[?,32,32,3]? 我的泡菜文件没有正确格式化吗?
这是用于创建pickle文件的代码:
def unpickle(file_name):
with open(file_name, 'rb') as opened_file:
data = pickle.load(opened_file, encoding='bytes')
return data
def create_training_pkl_file():
img_arrays_list = []
is_bird_boolean_list = []
training_dataset = []
for i in range(1,6):
batch = unpickle('./cifar-10-batches-py/data_batch_' + str(i))
for img in batch[b'data']:
img_arrays_list.append(img)
for label in batch[b'labels']:
is_bird_boolean_list.append(label==2)
training_dataset.append(img_arrays_list)
training_dataset.append(is_bird_boolean_list)
save_pickle(training_dataset, './training_dataset.pkl')
我正在使用CIFAR-10 dataset
答案 0 :(得分:1)
这是一个简单的课程,可以最好地解决您的问题。可能看起来很冗长,但在执行数据流图时很容易调用它们。
cwd = os.getcwd() # Should be same as the directory where you extracted the CIFAR-10 dataset
class DATA(cwd):
def __init__(self, directory = "./"):
self._directory = directory
self._training_data = []
self._training_labels = []
self._load_training_data()
np.random.seed(0)
samples_n = self._training_labels.shape[0]
random_indices = np.random.choice(samples_n, samples_n // 10,
replace = False)
np.random.seed()
self._training_data = np.delete(self._training_data, random_indices,
axis = 0)
self._training_labels = np.delete(self._training_labels,
random_indices)
def _load_training_data(self):
for i in range(1, 6):
path = os.path.join(self._directory, "data_batch_" + str(i))
with open(path, 'rb') as fd:
cifar_data = pickle.load(fd, encoding = "bytes")
imgs = cifar_data[b"data"].reshape([-1, 3, 32, 32]) #FLATTEN THE IMAGE
# imgs are not 3d tensors anymore.
imgs = imgs.transpose([0, 2, 3, 1]) # img tensors as row vectors # Resulting img.size() should equals number of neurons in the input layer.
if i == 1:
self._training_data = imgs
self._training_labels = cifar_data[b"labels"]
else:
self._training_data =np.concatenate([self._training_data, imgs], axis = 0)
self._training_labels = np.concatenate([self._training_labels, cifar_data[b"labels"]])
def get_training_batch(self, batch_size):
return self._get_batch(self._training_data, self._training_labels, batch_size)
def _get_batch(self, data, labels, batch_size):
samples_n = labels.shape[0]
if batch_size <= 0:
batch_size = samples_n
random_indices = np.random.choice(samples_n, samples_n, replace = False)
data = data[random_indices]
labels = labels[random_indices]
for i in range(samples_n // batch_size):
on = i * batch_size
off = on + batch_size
yield data[on:off], labels[on:off]
创建DATA类的实例
dataset = DATA()
获取批次的培训数据及其相应标签
training_data,training_labels = next(dataset.get_training_batch(batch_size))
我也像你一样处于学习曲线中,所以如果你需要更多关于代码的细节,你可以参考here
希望有所帮助!