我目前正在使用交叉验证来训练我的回归网络,我没有任何标签,但是应该映射到特定输出的特定输入,然后网络应该生成映射。我似乎有一些问题如何正在定义折叠。
我做交叉验证的方式是这样的:
############################### Training setup ##################################
#Define 10 folds:
seed = 7
np.random.seed(seed)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=seed)
print "Splits"
cvscores_loss = []
for train, test in kfold.split(train_set_data_vstacked_normalized,train_set_output_vstacked):
print "Model definition!"
model = Sequential()
#act = PReLU(init='normal', weights=None)
model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))
#act1 = PReLU(init='normal', weights=None)
model.add(Dense(output_dim=400,input_dim=400, init="normal",activation=K.tanh))
#act2 = PReLU(init='normal', weights=None)
model.add(Dense(output_dim=400, input_dim=400, init="normal",activation=K.tanh))
act4=ELU(10000)
model.add(Dense(output_dim=13, input_dim=300, init="normal",activation=act4))
print "Compiling"
model.compile(loss='mean_squared_error', optimizer='RMSprop', metrics=["accuracy"])
print "Compile done! "
print '\n'
print "Train start"
model.fit(train_set_data_vstacked_normalized[train],train_set_output_vstacked[train], nb_epoch=10, verbose=1)
loss, accuracy = model.evaluate(x=train_set_data_vstacked_normalized[test],y=train_set_output_vstacked[test],verbose=1)
print
print('loss: ', loss)
print('accuracy: ', accuracy)
print()
print model.summary()
print "New Model:"
cvscores_loss.append(loss)
print("%.2f%% (+/- %.2f%%)" % (numpy.mean(cvscores_loss), numpy.std(cvscores_loss)))
这段代码的问题在于我从不输入for循环..在打印“Splits”之后收到一条警告信息......它是。
Splits
/home/k/.local/lib/python2.7/site-packages/sklearn/model_selection/_split.py:579: Warning: The least populated class in y has only 1 members, which is too few. The minimum number of groups for any class cannot be less than n_splits=10.
这让人质疑kfold如何知道我的神经网络的输入和输出维度是什么?...
我应该在某处定义吗?或者如何?..
答案 0 :(得分:1)
该消息告诉您问题。您的一个目标类只有一个成员。当它分层10次时,每个级别至少需要10个成员,这样每个级别可以放1个。
您需要检查目标类的计数以找到有问题的类并将其删除。
答案 1 :(得分:0)
我认为你过于复杂了。如果您需要在Keras模型上进行交叉验证,可以使用keras scikit-learn API。要做到这一点,你需要:
导入一些东西:
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import cross_val_score
创建一个定义模型的函数:
def model_creation():
model = Sequential()
model.add(...)
...
model.compile(...)
return model
并使用包装器:
model = KerasClassifier(build_fn=model_creation, nb_epoch=100, batch_size=100, verbose=0)
kfold = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
results = cross_val_score(model, X, y, cv=kfold)