从DataFrames列表绘制线性回归

时间:2019-03-14 21:08:00

标签: python matplotlib scikit-learn linear-regression

我正在尝试绘制以下数据的线性回归,并且遇到显示以下错误的sklearn的LinearRegression.fit()函数:ValueError: Expected 2D array, got 1D array instead:。我不确定如何进行此操作,并且已经在该论坛上进行了很多研究,以了解如何绘制回归并从列表中提取数据框以进行分析和绘制。我试图转换为np.reshape并转换为数组无济于事。甚至不适合(X,Y [0])。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd 
from sklearn.linear_model import LinearRegression


colors = ['r','g','b','k', 'y', 'c', 'orange', 'm', 'darkviolet', 'lawngreen', 'firebrick']
dataset = pd.read_csv('data.csv', index_col=False)

# replace all instances of 'x' with blank
dataset = dataset.replace(to_replace='x', value='NaN') 

# get X, which is 1-10 in this case
X = dataset.iloc[:,0] 

# length of set X
lenX = float(len(X)) 

def get_Y(dataset, iterations):
    '''
    gets Y and the mean of each set of Y
    '''
    Y_list, Y_mean = [], []
    i = 1

    while i<(iterations+1): 
        Y = dataset.iloc[:,i]
        Y = pd.to_numeric(Y, errors='coerce') # change object dataframe to float64
        Y_list.append(Y) 
        Y_mean.append(Y.mean()) # get mean
        i += 1
    return Y_list, Y_mean

Y, Y_mean = get_Y(dataset, lenX)

# plotting all 10 lines
for i in range(len(X)):
    plt.plot(X, Y[i], colors[i])

#newY = np.reshape(Y,100)
#newX = np.reshape(X,10)

LinearRegression().fit(newX,newY)
#reg.score(X,Y)

plt.legend(loc='best')
plt.show()

这在data.csv中:

,1,2,3,4,5,6,7,8,9,10
1,3.5,3.4,3.0,3.6,3.5,3.1,3.2,3.5,3.0,3.5
2,2.9,2.6,2.9,2.7,2.5,2.6,2.9,3.1,2.6,3.0
3,2.3,2.5,2.3,2.0,2.7,2.7,2.4,2.5,2.8,2.3
4,2.1,2.4,2.3,2.4,2.6,2.1,2.0,2.6,2.2,2.2
5,2.2,1.9,2.0,2.3,2.1,2.0,2.1,1.8,1.9,1.8
6,1.9,2.0,2.1,2.2,1.8,2.3,2.2,1.8,2.1,1.7
7,1.9,2.1,2.1,2.3,1.9,2.3,2.1,2.0,2.2,2.0
8,x,2.2,2.1,2.3,1.9,2.3,2.1,2.9,x,2.1
9,x,1.9,x,2.2,x,2.2,1.9,x,x,1.8
10,x,1.9,x,2.1,x,x,2.1,x,x,2.0

Plot without regression.

0 个答案:

没有答案