Question

我正在尝试创建一个顺序神经网络，其中输出为12个非排他概率（A概率，B概率，C概率……）。我的网络似乎学习了最常见的输出，并始终针对每个输入进行预测。我所有的输出值始终为“ 1”或“ 0”，中间没有任何值，并且在相同位置始终具有相同的值（详细信息如下）。

我距离ML专家还差得很远，所以解决方案很简单。

我尝试使用不同的批次大小（从8到128）和许多不同的损失函数，但似乎没有帮助。

我如何用Keras创建模型：

model = Sequential()
model.add( Dense( 150, input_dim=9600, activation='relu') )
model.add( LeakyReLU(alpha=.01) )
model.add( Dense( 50, activation='relu') )
model.add( LeakyReLU(alpha=.01) )
model.add( Dense( 12, activation='sigmoid') )

metrics_to_output=[ 'accuracy' ]
# I've tried many loss functions, not just mean_squared_error
model.compile( loss='mean_squared_error', optimizer='adam', metrics=metrics_to_output )

这可能无关紧要，但这就是我准备数据和训练模型的方式。我也尝试过使用train_on_batch：

def generate_data_from_files( file1, file2 ):
    input = numpy.load( file1, allow_pickle=True )
    output = numpy.load( file2, allow_pickle=True )

    # The file only has 2 values, and I generate 12 probabilities derived from those 2 values
    transformed_output = output.copy()
    new_shape = ( output.shape[ 0 ], 12 )
    transformed_output.resize( new_shape )

    for x in range( 0, len( output ) ):
        #First 6 probabilities model the value of output[ x ][ 0 ]
        transformed_output[ x ][ 0 ] = 1 if output[ x ][ 0 ] <= -5.0 else 0
        transformed_output[ x ][ 1 ] = 1 if output[ x ][ 0 ] <= -3.0 else 0
        transformed_output[ x ][ 2 ] = 1 if output[ x ][ 0 ] <= -1.0 else 0
        transformed_output[ x ][ 3 ] = 1 if output[ x ][ 0 ] >= 1.0 else 0
        transformed_output[ x ][ 4 ] = 1 if output[ x ][ 0 ] >= 3.0 else 0
        transformed_output[ x ][ 5 ] = 1 if output[ x ][ 0 ] >= 5.0 else 0
        #Second 6 probabilities model the value of output[ x ][ 1 ]
        transformed_output[ x ][ 6 ] = 1 if output[ x ][ 1 ] <= -5.0 else 0
        transformed_output[ x ][ 7 ] = 1 if output[ x ][ 1 ] <= -3.0 else 0
        transformed_output[ x ][ 8 ] = 1 if output[ x ][ 1 ] <= -1.0 else 0
        transformed_output[ x ][ 9 ] = 1 if output[ x ][ 1 ] >= 1.0 else 0
        transformed_output[ x ][ 10] = 1 if output[ x ][ 1 ] >= 3.0 else 0
        transformed_output[ x ][ 11] = 1 if output[ x ][ 1 ] >= 5.0 else 0
    return input, transformed_output


input, output = generate_data_from_file( file1, file2 )
model.fit( x=input, y=output, batch_size=8, epochs=1 )

我希望获得12个值，范围从0到1，每个值模拟一个概率。但是，当我使用网络进行预测（甚至根据训练数据）时，我总是会得到相同的输出：

0 1 1 0 0 0 0 0 0 0 0 0

这是一个合理的平均猜测，因为第二个和第三个布尔值通常为true，而其他所有值通常都为false，但是即使在预期输出是其他内容的训练数据上，我也看不到此预测有任何变化。我确实偶尔会看到一个0.9999999或一个0.000001代替0或1，但这甚至很少见。

我的收获是，我正在设置模型以始终预测平均情况。任何反馈或建议，将不胜感激。预先感谢！

修改：谢谢大家的建议。阅读更多有关此内容的信息后，我认为正在发生的事情是我的输出层变得饱和了。我将改用softsign代替Sigmoid（并调整逻辑以使-1为下限而不是0），希望对您有所帮助。

Answer 1

您正在为输出层使用S型激活功能。

model.add( Dense( 12, activation='sigmoid') )

Sigmoid输出0或1。我认为您要寻找的是softmax激活函数，该函数输出0到1之间的值，并且所有（12）值加起来为1。然后对argmax进行运算找到最高的价值并将其作为您的预测。

另外两件事：为什么要在隐藏层中使用两个激活功能？使用一个或另一个，不要同时使用。

model.add( Dense( 50, activation='relu') )
model.add( LeakyReLU(alpha=.01) )

均方误差用于回归问题，根据您的描述，这似乎是分类问题。

为什么我的神经网络总是给我同样的预测？

1 个答案: