我有一个Excel工作表,其中包含一组数据(基本上是几列),而最后一列称为“分数”,其值基于前几列(小数,即0.000到100.000)。 (总计27,000行)
我要实现的目标是预测新数据的得分,因为我首先要使用excel表训练模型。
set_random_seed(7)
dataframe = pd.read_excel('SS_Abcde.xlsx')
data = dataframe.iloc[:, 3:23]
labels_column = np.array(dataframe[['Score']])
print(labels_column.shape[0])
print("printing len of score", len(labels_column))
uniqueData = (np.unique(labels_column))
print("printing unique len of score", len(uniqueData))
labels_column = to_categorical(labels_column)
labels_column = [labels_column]
training_data = data
training_labels = labels_column
print("Start the training of the model")
model = Sequential()
#model.add(BatchNormalization())
model.add(Dense(4, input_dim=20, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(100, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
print("model Fitting")
model.fit(training_data, training_labels, epochs=5, verbose=1)
print("training has done")
但是当我运行它时,它给了我错误:
ValueError: Error when checking target: expected dense_3 to have shape (100,) but got array with shape (86,)
如果我更改了
model.add(Dense(100, activation='softmax'))
对此:
model.add(Dense(86, activation='softmax'))
它起作用了,因为它以定义的时期开始模型训练。但为什么?为什么它不能与Dense(100)一起使用?这不是输出层吗?
编辑:
在@Reza Behzadpou之后,我对我的完整数据集进行了归一化,其图片如下:
model = Sequential()
#model.add(BatchNormalization())
model.add(Dense(4, input_dim=20, activation='relu'))
model.add(Dense(4, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam')
print("model Fitting")
model.fit(training_data, training_labels, epochs=150, verbose=1)
print("training has done")
Xnew = np.array([[4.14854335054294e-21, 1, 1.36799259164156e-05, 1, 0, 0, 0, 0, 0, 1, 5.44716062111488e-06, 1, 0, 0, 0, 0, 0, 0, 1, 1]])
#Note that the above set of data already exists in the csv and its Score is 0.6137532, so I am assuming somewhat closer predication
ynew=model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))
输出:
我得到的预测值为1,而不是接近0.6137532。
enter image description here enter image description here
我用来标准化数据集的方法
def GetNormalizedValue(val, min, max):
if min == max:
return 0
denominator = max - min
numerator = float(val) - min
value = numerator / denominator
return value
编辑2:
我什至尝试将其与MinMaxScaler()一起使用:
dataset=np.loadtxt("SS_Munir_Updated.csv", delimiter=",")
x=dataset[:, 0:20]
y=dataset[:, 20]
y=np.reshape(y, (-1,1))
scaler = MinMaxScaler()
print(scaler.fit(x))
print(scaler.fit(y))
xscale=scaler.transform(x)
yscale=scaler.transform(y)
X_train, X_test, y_train, y_test = train_test_split(xscale, yscale)
model = Sequential()
model.add(Dense(12, input_dim=20, kernel_initializer='normal', activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# model.compile(loss='mse', optimizer='adam', metrics=['mse','mae'])
history = model.fit(X_train, y_train, epochs=150, verbose=1)
Xnew = np.array([[0.00000000000000000000414854335054294, 1, 0.0000136799259164156, 1, 0, 0, 0, 0, 0, 1,
0.00000544716062111488, 1, 0, 0, 0, 0, 0, 0, 1, 1]])
ynew=model.predict(Xnew)
print("X=%s, Predicted=%s" % (Xnew[0], ynew[0]))
输出:
1。
而不是接近0.6137532的任何值
答案 0 :(得分:1)
这是因为得分列中有86个不同的值,而要使softmax层对其进行分类,它需要86个不同的神经元。
这里要注意几件事:
标准化数据集后,如下更改代码:
像这样更改输出层:
model.add(Dense(1, activation='sigmoid'))
并编译如下代码:
model.compile(loss='binary_crossentropy', optimizer='adam')
希望有帮助。