如何使用Scikit Learn的高斯过程使用具有相同目标值的输入要素

时间:2016-07-16 16:13:44

标签: python scikit-learn regression gaussian

我是Scikit Learn的新手,正在尝试学习如何使用高斯过程回归。

我正在尝试使用具有重复数字的数据集,例如:

array(x,y) = [[10, 10, 20, 20, 15, 17], [30, 40, 50, 60, 50, 40]]

在使用Scikit Learn进行高斯过程回归的文档时,我遇到了以下问题:

C:\Python27\lib\site-packages\sklearn\gaussian_process\gaussian_process.pyc in fit(self, X, y)
    298         if (np.min(np.sum(D, axis=1)) == 0.
    299                 and self.corr != correlation.pure_nugget):
--> 300             raise Exception("Multiple input features cannot have the same"
    301                             " target value.")
    302     
Exception: Multiple input features cannot have the same target value.

这是我的代码:

import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcess


#Import CSV file
dataset = np.loadtxt(open("data.csv","rb"),delimiter=",",skiprows=1)

#Separate CSV file columns into X,Y
X = np.atleast_2d(dataset[:,0]).T
y = dataset[:,1].ravel()

#set values for x-axis plot 
min = np.amin(dataset[:,0])
max = np.amax(dataset[:,0])

x = np.atleast_2d(np.linspace(min, max, 1000)).T

# Instanciate a Gaussian Process model
gp = GaussianProcess(corr='cubic', theta0=1e-2, thetaL=1e-4, thetaU=1e-1,
                     random_start=100)

# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)

是否可以在输入数据集中重复出现值?如果是这样,我该怎么做呢?

0 个答案:

没有答案