Question

我创建了以下Python程序，据我所知，CTC应该是一个有效的基于CTC的模型，以及训练数据。我能找到的最好的文档是CNTK_208_Speech_CTC Tutorial，这就是我的基础。该程序尽可能简单，只依赖numpy和CNTK，并自行生成数据。

当我运行它时，我收到以下错误：

验证 - ＆gt; ForwardBackward2850 = ForwardBackward（LabelsToGraph2847，StableSigmoid2703）：[5 x labelAxis1]，[5 x inputAxis1] - ＆gt; []

RuntimeError：ForwardBackwardNode操作中的Matrix维度不匹配。

此票证似乎是同一个问题：https://github.com/Microsoft/CNTK/issues/2156

这是Python程序：

# cntk_ctc_hello_world.py
#
# This is a "hello world" example of using CTC (Connectionist Temporal Classification) with CNTK.
#
# The input is a sequence of vectors of size 17. We use 17 because it's easy to spot that number in 
# error messages. The output is a string of codes, each code being one of 4 possible characters from
# our alphabet that we'll refer to here as "ABCD", although they're actually just represented
# by the numbers 0..3, which is typical for classification systems. To make the setup of training data
# trivial, we assign the first four elements of our 17-dimension input vector to the four characters
# of our alphabet, so that the matching is:
# 10000000000000000  A
# 01000000000000000  B
# 00100000000000000  C
# 00010000000000000  D
# In our input sequences, we repeat each code three to five times, followed by three to five codes
# containing random noise. Whether it's repeated 3,4, or 5 times, is random for each code and each
# spacer. When we emit one of our codes, we fill the first 4 values with the code, and the remaining
# 13 values with random noise.
# For example:
# Input:  AAA-----CCCC---DDDDD
# Output: ACD

import cntk as C
import numpy as np
import random
import sys

InputDim = 17
NumClasses = 4 # A,B,C,D
MinibatchSize = 100
MinibatchPerEpoch = 50
NumEpochs = 10
MaxOutputSeqLen = 10 # ABCDABCDAB

inputAxis = C.Axis.new_unique_dynamic_axis('inputAxis')
labelAxis = C.Axis.new_unique_dynamic_axis('labelAxis')
inputVar = C.sequence.input_variable((InputDim), sequence_axis=inputAxis, name="input")
labelVar = C.sequence.input_variable((NumClasses+1), sequence_axis=labelAxis, name="labels")

# Construct an LSTM-based model that will perform the classification
with C.default_options(activation=C.sigmoid):
    classifier = C.layers.Sequential([
        C.layers.For(range(3), lambda: C.layers.Recurrence(C.layers.LSTM(128))),
        C.layers.Dense(NumClasses + 1)
    ])(inputVar)

criteria = C.forward_backward(C.labels_to_graph(labelVar), classifier, blankTokenId=NumClasses, delayConstraint=3)
err = C.edit_distance_error(classifier, labelVar, squashInputs=True, tokensToIgnore=[NumClasses])

lr = C.learning_rate_schedule([(3, .01), (1,.001)], C.UnitType.sample)
mm = C.momentum_schedule([(1000, 0.9), (0, 0.99)], MinibatchSize)
learner = C.momentum_sgd(classifier.parameters, lr, mm)
trainer = C.Trainer(classifier, (criteria, err), learner)

# Return a numpy array of 17 elements, for this code
def make_code(code):
    a = np.zeros(NumClasses)                  # 0,0,0,0
    v = np.random.rand(InputDim - NumClasses) # 13x random
    a = np.concatenate((a, v))
    a[code] = 1
    return a

def make_noise_code():
    return np.random.rand(InputDim)

def make_onehot(code):
    v = np.zeros(NumClasses+1)
    v[code] = 1
    return v

def gen_batch():
    x_batch = []
    y_batch = []
    for mb in range(MinibatchSize):
        yLen = random.randint(1, MaxOutputSeqLen)
        x = []
        y = []
        for i in range(yLen):
            code = random.randint(0,3)
            y.append(make_onehot(code))
            xLen = random.randint(3,5) # Input is 3 to 5 repetitions of the code
            for j in range(xLen):
                x.append(make_code(code))
            spacerLen = random.randint(3,5) # Spacer is 3 to 5 repetitions of noise
            for j in range(spacerLen):
                x.append(make_noise_code())
        x_batch.append(np.array(x, dtype='float32'))
        y_batch.append(np.array(y, dtype='float32'))
    return x_batch, y_batch

#######################################################################################
# Dump first X/Y training pair from minibatch
#x, y = gen_batch()
#print("\nx sequence of first sample of minibatch:\n", x[0])
#print("\ny sequence of first sample of minibatch:\n", y[0])
#######################################################################################

progress_printer = C.logging.progress_print.ProgressPrinter(tag='Training', num_epochs=NumEpochs)

for epoch in range(NumEpochs):
    for mb in range(MinibatchPerEpoch):
        x_batch, y_batch = gen_batch()
        trainer.train_minibatch({inputVar: x_batch, labelVar: y_batch})

    progress_printer.epoch_summary(with_metric=True)

Answer 1

对于那些遇到此错误的人，有两点需要注意：

（1）提供给标签labels_to_graph的序列张量的数据必须与运行时从网络输出中输出的数据具有相同的序列长度。

（2）如果在模型构建过程中更改了输入序列张量的动态序列轴（例如，步幅在顺序轴上），则必须在标签序列张量上调用reconcile_dynamic_axes，并使用network_output作为第二个参数功能。这告诉CNTK，标签具有与网络相同的动态轴。

坚持这两个点将允许forward_backward运行。

＆＃34; Hello World＆＃34; CTC（连接主义时间分类）模型

1 个答案: