Question

根据Keras文档image_dataset_from_directory（）返回：

A tf.data.Dataset object. 
- If label_mode is None, it yields float32 tensors of shape (batch_size, image_size[0], image_size[1], num_channels), encoding images (see below for rules regarding num_channels). 
- Otherwise, it yields a tuple (images, labels), where images has shape (batch_size, image_size[0], image_size[1], num_channels), and labels follows the format described below.

Rules regarding labels format: 
- if label_mode is int, the labels are an int32 tensor of shape (batch_size,).
- if label_mode is binary, the labels are a float32 tensor of 1s and 0s of shape (batch_size, 1). 
- if label_mode is categorial, the labels are a float32 tensor of shape (batch_size, num_classes), representing a one-hot encoding of the class index

使用时：

train_dataset = image_dataset_from_directory(
    directory=TRAIN_DIR,
    labels="inferred",
    label_mode="categorical",
    class_names=["0", "10", "5"],
    image_size=SIZE,
    seed=SEED,
    subset=None,
    interpolation="bilinear",
    follow_links=False,
)

即使我将label_mode设置为“ categorical”，我仍然获得（None，224,224,3）的图像和（None，3）的标签。即使我将batch_size显式设置为32（默认值为32，但我尝试通过它查看是否有所不同），批处理大小也不会添加到形状中。由于这个原因，我一直在训练模型时遇到问题，因为需要为TimeDistributed图层添加批处理大小。

#train_dataset.element_spec
(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None),
 TensorSpec(shape=(None, 3), dtype=tf.float32, name=None))

编辑：我试图弄清楚为什么在使用MobileNetV2和LSTM的迁移学习对LSTM进行视频分类时训练模型时出现以下错误，并且认为batch_size不存在于数据集中是问题所在。

ValueError: Input 0 of layer sequential_16 is incompatible with the layer: expected ndim=5, found ndim=4. Full shape received: [None, 224, 224, 3]

模型代码：

MobilenetV2功能：

def build_mobilenet(shape=INPUT_SHAPE, nbout=CLASSES):

    # INPUT_SHAPE = (224,224,3)

    # CLASSES = 3

    model = MobileNetV2(

        include_top=False,

        input_shape=shape,

        weights='imagenet')

    base_model.trainable = True

    output = GlobalMaxPool2D()

    return Sequential([model, output])

LSTM函数：

def action_model(shape=INSHAPE, nbout=3):

    # INSHAPE = (5, 224, 224, 3)

    convnet = build_mobilenet(shape[1:])
    
    model = Sequential()

    model.add(TimeDistributed(convnet, input_shape=shape))

    model.add(LSTM(64))

    model.add(Dense(1024, activation='relu'))

    model.add(Dropout(.5))

    model.add(Dense(512, activation='relu'))

    model.add(Dropout(.5))

    model.add(Dense(128, activation='relu'))

    model.add(Dropout(.5))

    model.add(Dense(64, activation='relu'))

    model.add(Dense(nbout, activation='softmax'))

    return model

Answer 1

这不是批处理大小的问题。但是你输入的数据格式。代码：

from tensorflow import keras
from tensorflow.keras.layers import *

def build_mobilenet(shape=(224,224,3), nbout=3):
    model = tf.keras.applications.MobileNetV2(
        include_top=False,
        input_shape=shape,
        weights='imagenet')
    model.trainable = True
    output = tf.keras.layers.GlobalMaxPool2D()
    return tf.keras.Sequential([model, output])


def action_model(shape=(5, 224, 224, 3), nbout=3):
    convnet = build_mobilenet()
    model = tf.keras.Sequential()
    model.add(TimeDistributed(convnet, input_shape=shape))
    model.add(LSTM(64))
    model.add(Dense(1024, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(.5))
    model.add(Dense(64, activation='relu'))
    model.add(Dense(nbout, activation='softmax'))
    return model

model = action_model()
tf.keras.utils.plot_model(model, 'my_first_model.png', show_shapes=True)

这给出了输出：

您可以看到模型期望输入5d张量，但是您提供的是4d张量。

此模型适用于5d张量：

代码：

x = tf.constant(np.random.randint(50, size =(32,5,224,224,3)), dtype = tf.float32)
model(x)

输出：

<tf.Tensor: shape=(32, 3), dtype=float32, numpy=
array([[0.30153075, 0.3630225 , 0.33544672],
       [0.3018494 , 0.36799458, 0.33015603],
       [0.2965148 , 0.36714798, 0.3363372 ],
       [0.30032247, 0.36478844, 0.33488905],
       [0.30106384, 0.36145815, 0.33747798],
       [0.29292756, 0.3652076 , 0.34186485],
       [0.29766476, 0.35945407, 0.34288123],
       [0.29290855, 0.36984667, 0.33724475],
       [0.30804047, 0.35799438, 0.33396518],
       [0.30497718, 0.35853127, 0.33649153],
       [0.29357925, 0.36751047, 0.33891028],
       [0.29514724, 0.36558747, 0.33926526],
       [0.29731706, 0.3684161 , 0.33426687],
       [0.30811843, 0.3656716 , 0.32621   ],
       [0.29937437, 0.36403805, 0.33658758],
       [0.2967953 , 0.36977535, 0.3334294 ],
       [0.30307695, 0.36372742, 0.33319563],
       [0.30148408, 0.36562964, 0.33288625],
       [0.29590267, 0.36651734, 0.33758003],
       [0.29640752, 0.36192682, 0.3416656 ],
       [0.30003947, 0.36704347, 0.332917  ],
       [0.29541495, 0.3681183 , 0.33646676],
       [0.29900452, 0.36397702, 0.33701843],
       [0.3028345 , 0.36404026, 0.33312523],
       [0.30092967, 0.36406764, 0.33500263],
       [0.29969287, 0.36108258, 0.33922455],
       [0.29743004, 0.36917207, 0.3333979 ],
       [0.29056188, 0.3742272 , 0.33521092],
       [0.30297956, 0.36698693, 0.3300335 ],
       [0.29843566, 0.3594078 , 0.3421565 ],
       [0.29280537, 0.36777246, 0.33942217],
       [0.29983717, 0.3691762 , 0.33098662]], dtype=float32)>

您正在使用的image_dataset_from_directory函数无法生成5d张量。您必须使用自定义数据生成器从数据中生成5d张量。

从image_dataset_from_directory函数生成的数据集不包括批处理大小

1 个答案: