Question

我有一个Excel工作表，其中包含一组数据（基本上是几列），而最后一列称为“分数”，其值基于前几列（小数，即0.000到100.000）。（总计27,000行）

我要实现的目标是预测新数据的得分，因为我首先要使用excel表训练模型。

set_random_seed(7)
dataframe = pd.read_excel('SS_Abcde.xlsx')

data = dataframe.iloc[:, 3:23]

labels_column = np.array(dataframe[['Score']])
print(labels_column.shape[0])


print("printing len of score", len(labels_column))

uniqueData = (np.unique(labels_column))
print("printing unique len of score", len(uniqueData))

labels_column = to_categorical(labels_column)
labels_column = [labels_column]

training_data = data
training_labels = labels_column


print("Start the training of the model")

model = Sequential()
#model.add(BatchNormalization())
model.add(Dense(4, input_dim=20, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(100, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

print("model Fitting")
model.fit(training_data, training_labels, epochs=5, verbose=1)
print("training has done")

但是当我运行它时，它给了我错误：

ValueError: Error when checking target: expected dense_3 to have shape (100,) but got array with shape (86,)

如果我更改了

model.add(Dense(100, activation='softmax'))

对此：

model.add(Dense(86, activation='softmax'))

它起作用了，因为它以定义的时期开始模型训练。但为什么？为什么它不能与Dense（100）一起使用？这不是输出层吗？

编辑：

在@Reza Behzadpou之后，我对我的完整数据集进行了归一化，其图片如下：

enter image description here

model = Sequential()
#model.add(BatchNormalization())
model.add(Dense(4, input_dim=20, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')

print("model Fitting")
model.fit(training_data, training_labels, epochs=150, verbose=1)
print("training has done")

Xnew = np.array([[4.14854335054294e-21, 1, 1.36799259164156e-05, 1, 0, 0, 0, 0, 0, 1, 5.44716062111488e-06, 1, 0, 0, 0, 0, 0, 0, 1, 1]])

#Note that the above set of data already exists in the csv and its Score is 0.6137532, so I am assuming somewhat closer predication

ynew=model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

输出：

我得到的预测值为1，而不是接近0.6137532。

enter image description here enter image description here

我用来标准化数据集的方法

def GetNormalizedValue(val, min, max):
    if min == max:
        return 0
    denominator = max - min
    numerator = float(val) - min
    value = numerator / denominator
    return value

编辑2：

我什至尝试将其与MinMaxScaler（）一起使用：

dataset=np.loadtxt("SS_Munir_Updated.csv", delimiter=",")
x=dataset[:, 0:20]
y=dataset[:, 20]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)


X_train, X_test, y_train, y_test = train_test_split(xscale, yscale)


model = Sequential()
model.add(Dense(12, input_dim=20, kernel_initializer='normal', activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

model.summary()


model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])


history = model.fit(X_train, y_train, epochs=150,  verbose=1)

Xnew = np.array([[0.00000000000000000000414854335054294, 1, 0.0000136799259164156, 1, 0, 0, 0, 0, 0, 1,
                  0.00000544716062111488, 1, 0, 0, 0, 0, 0, 0, 1, 1]])

ynew=model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))

输出：

1。

而不是接近0.6137532的任何值

Answer 1

这是因为得分列中有86个不同的值，而要使softmax层对其进行分类，它需要86个不同的神经元。

这里要注意几件事：

您遇到了回归问题，但是您正在使用 softmax 层预测分数，该分数用于分类问题。采用改为“ Sigmoid ”。
您的输入数据没有规范化。你提到你的得分值在0.0000到100.000之间，这是一个很大的范围用于深度学习网络。缩放您的数据集列值以范围为0到1。编码和 OneHotEncode 中的任何“字符串”列您的数据集。

标准化数据集后，如下更改代码：

像这样更改输出层：

model.add(Dense(1, activation='sigmoid'))

并编译如下代码：

model.compile(loss='binary_crossentropy', optimizer='adam')

希望有帮助。

检查目标时出错：预期density_3的形状为（1，），但数组的形状为（86，）

1 个答案: