即使在SKlearn中的x-training集内设置了y-training也存在弱相关性

时间:2016-07-31 19:18:42

标签: python machine-learning scikit-learn neural-network

我们正在使用Python + SK-Learn和MLPClassifier。我们得到了相对糟糕的结果。作为一个完整性检查,我们尝试将y-desired输出添加到x-input集。在这种情况下,您可以获得100%的分数。但事实并非如此,得分相当低(20%),这比随机猜测要好得多,但仍然非常糟糕。我们有大约150个输入(大多数是布尔值)和1个输出,它是1到1500之间的整数。当我们将数字分成大约5个类别(0到4之间的整数)时,得分大约为96%。

import numpy as np
from sklearn.neural_network import MLPClassifier
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import StandardScaler
import cPickle
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error


scaler = StandardScaler()
xset = np.genfromtxt('xkey.csv', delimiter=",")
yset = np.genfromtxt('ykey.csv', delimiter=",")
# yset = np.rint((5-1)*(yset)/np.max(yset))
print "Number of categories: " + str(np.max(yset)+1)
X_train, X_test, y_train, y_test = train_test_split(xset, yset, test_size=0.10)
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

clf = MLPClassifier(algorithm='adam', alpha=1e-5, hidden_layer_sizes=(100,20))
clf.fit(X_train, y_train)

score = clf.score(X_test,y_test)
print ("score: "+str(score))

ypredicted = clf.predict(X_test)
sqerr = mean_squared_error(ypredicted, y_test)
err = mean_absolute_error(ypredicted, y_test)

print ("err: " + str(err))
print ("sqerr: " + str(sqerr))

0 个答案:

没有答案