我对机器学习还很陌生,但是我正在尝试使用scikit-learn中的MLPRegressor来建模具有4个输入和1个输出的数据。数据集具有更多的输入和输出,但是我相信我选择了唯一适用于我选择的输出的输入和输出。我的数据集中有大约60,000个样本。该模型学习了大多数数据,但似乎在输出上有上限和下限。
我尝试了许多不同的超参数组合,但没有一个摆脱输出的明显界限。我曾尝试将数据标准化,但并没有真正的帮助。对于这组特定的超参数,损失为106.555,训练和测试数据的分数均为0.998。
这是代码:
# Importing the data from a .csv file
input_cols = [3,4,7,9]
output_cols = [8]
X, y, all_data = [], [], []
with open(data_path, 'r') as data:
reader = csv.reader(data)
i = 0
for line in reader:
try:
a = [float(line[3]), float(line[4]), float(line[7]), float(line[10])]
b = [float(line[9])]
X.append(a)
y.append(b)
except ValueError:
print('ValueError')
all_data = [X, y]
print('Done importing data')
# Splitting the data for training and testing
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state = 42)
# Training the neural network
model = MLPRegressor(max_iter=10**4, verbose=True, hidden_layer_sizes=(10,10,10), tol=0.00001, learning_rate_init=0.005, random_state=1, \
activation='logistic', solver='adam')
print('Beginning training')
model.fit(X_train, y_train)
print('Training complete')
# Results
winsound.PlaySound('C:/Windows/media/Windows Background.wav', winsound.SND_FILENAME)
print('Score on training data: {:.3f}'.format(model.score(X_train, y_train)))
print('Score on testing data: {:.3f}'.format(model.score(X_test, y_test)))
以下是结果图。每个图是在x轴上绘制的4个输入和在y轴上绘制的输出之一。红色是网络的预测,蓝色是实际数据。出于隐私原因,我必须删除这些轴,但是知道y轴的范围是-700到700。 Results plots