TensorFlow错误的数组形状

时间:2018-01-01 16:09:59

标签: python numpy tensorflow

我实际上是TensorFlow和ML的新手,我正在尝试从pickle文件中加载数据集。我的数据集是2个列表的列表。第一个列表是10000个图像,每个图像由3072个字节的数组表示。每种颜色1024(rgb)。另一个列表是10 000布尔值。我像这样加载我的数据集:

X, Y = pickle.load(open('training_dataset.pkl', 'rb'))

然后我使用以下代码创建我的网络:

network = input_data(shape=[None, 32, 32, 3])

获得ValueError: Cannot feed value of shape (96, 3072) for Tensor 'InputData/X:0', which has shape '(?, 32, 32, 3)'

如何将数据集重塑为[?,32,32,3]? 我的泡菜文件没有正确格式化吗?

这是用于创建pickle文件的代码:

def unpickle(file_name):
    with open(file_name, 'rb') as opened_file:
        data = pickle.load(opened_file, encoding='bytes')
    return data


def create_training_pkl_file():
    img_arrays_list = []
    is_bird_boolean_list = []
    training_dataset = []

    for i in range(1,6):
        batch = unpickle('./cifar-10-batches-py/data_batch_' + str(i))
        for img in batch[b'data']:
            img_arrays_list.append(img)

        for label in batch[b'labels']:
            is_bird_boolean_list.append(label==2)

    training_dataset.append(img_arrays_list)
    training_dataset.append(is_bird_boolean_list)

    save_pickle(training_dataset, './training_dataset.pkl')

我正在使用CIFAR-10 dataset

1 个答案:

答案 0 :(得分:1)

这是一个简单的课程,可以最好地解决您的问题。可能看起来很冗长,但在执行数据流图时很容易调用它们。

cwd = os.getcwd() # Should be same as the directory where you extracted the CIFAR-10 dataset

class DATA(cwd):
    def __init__(self, directory = "./"):
        self._directory = directory

        self._training_data = []
        self._training_labels = []       
        self._load_training_data()

        np.random.seed(0)
        samples_n = self._training_labels.shape[0]
        random_indices = np.random.choice(samples_n, samples_n // 10, 
                                          replace = False)
        np.random.seed()

        self._training_data = np.delete(self._training_data, random_indices, 
                                        axis = 0)
        self._training_labels = np.delete(self._training_labels, 
                                          random_indices)


    def _load_training_data(self):
        for i in range(1, 6):
            path = os.path.join(self._directory, "data_batch_" + str(i))
            with open(path, 'rb') as fd:
                cifar_data = pickle.load(fd, encoding = "bytes")
                imgs = cifar_data[b"data"].reshape([-1, 3, 32, 32]) #FLATTEN THE IMAGE
                # imgs are not 3d tensors anymore.
                imgs = imgs.transpose([0, 2, 3, 1]) # img tensors as row vectors # Resulting img.size() should equals number of neurons in the input layer.
                if i == 1:
                    self._training_data = imgs
                    self._training_labels = cifar_data[b"labels"]
                else:
                    self._training_data =np.concatenate([self._training_data, imgs], axis = 0)
                    self._training_labels = np.concatenate([self._training_labels, cifar_data[b"labels"]])

    def get_training_batch(self, batch_size):
        return self._get_batch(self._training_data, self._training_labels, batch_size)

    def _get_batch(self, data, labels, batch_size):
        samples_n = labels.shape[0]
        if batch_size <= 0:
            batch_size = samples_n

        random_indices = np.random.choice(samples_n, samples_n, replace = False)
        data = data[random_indices]
        labels = labels[random_indices]
        for i in range(samples_n // batch_size):
            on = i * batch_size
            off = on + batch_size
            yield data[on:off], labels[on:off]

创建DATA类的实例

dataset = DATA()

获取批次的培训数据及其相应标签

training_data,training_labels = next(dataset.get_training_batch(batch_size))

我也像你一样处于学习曲线中,所以如果你需要更多关于代码的细节,你可以参考here

希望有所帮助!