model.score在scikit返回NaN

时间:2018-01-31 14:26:06

标签: python pandas numpy scikit-learn

我试图制作一个线性回归模型,用于预测汽车提供的里程数,并提供一些数值。当我尝试使用dataframe.score时,我收到了NaN错误。我的数据集不包含任何空值。请找到代码及其附加的输出。任何帮助表示赞赏。提前谢谢。

import pandas as p
import numpy as np`   
d=p.read_csv('cars.csv')`

d=d.drop('Make', axis=1)
d=d.drop('Model', axis=1)
d=d.drop('Engine_Fuel_Type', axis=1)
d=d.drop('Number_of_Doors', axis=1)
d=d.drop('Market_Category', axis=1)
d=d.drop('Vehicle_Size', axis=1)
d=d.drop('Vehicle_Style', axis=1)

d['Transmission_Type']=d['Transmission_Type'].replace({1: 'MANUAL', 2: 'AUTOMATIC', 3: 'AUTOMATED_MANUAL', 4: 'DIRECT_DRIVE', 5: 'UNKNOWN'})
d=p.get_dummies(d, columns=['Transmission_Type'])

print(d.head())

X=d.drop('city_mpg', axis=1)
y=d[['city_mpg']]

from sklearn.model_selection import train_test_split


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)

from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)

for idx, col_name in enumerate(X_train.columns):
   print("Co efficient for", col_name, " is ", model.coef_[0][idx])

intercept = model.intercept_[0]

print("Intercept is", intercept)


print(model.score(X_test, y_test))
我得到的输出是:

   Year  Engine_HP  Engine_Cylinders  highway_MPG  city_mpg  Popularity  \
0  2011        335               6.0           26        19        3916   
1  2011        300               6.0           28        19        3916   
2  2011        300               6.0           28        20        3916   
3  2011        230               6.0           28        18        3916   
4  2011        230               6.0           28        18        3916   

    MSRP  Transmission_Type_AUTOMATED_MANUAL  Transmission_Type_AUTOMATIC  \
0  46135                                   0                            0   
1  40650                                   0                            0   
2  36350                                   0                            0   
3  29450                                   0                            0   
4  34500                                   0                            0   

   Transmission_Type_DIRECT_DRIVE  Transmission_Type_MANUAL  \
0                               0                         1   
1                               0                         1   
2                               0                         1   
3                               0                         1   
4                               0                         1   

   Transmission_Type_UNKNOWN  
0                          0  
1                          0  
2                          0  
3                          0  
4                          0  
Co efficient for Year  is  0.12795086619034354
Co efficient for Engine_HP  is  -0.015142081226758822
Co efficient for Engine_Cylinders  is  -0.4874611334108649
Co efficient for highway_MPG  is  0.4410894679555171
Co efficient for Popularity  is  3.102779517592471e-05
Co efficient for MSRP  is  8.390933189373478e-06
Co efficient for Transmission_Type_AUTOMATED_MANUAL  is  -10.972343474157594
Co efficient for Transmission_Type_AUTOMATIC  is  -11.256303676369456
Co efficient for Transmission_Type_DIRECT_DRIVE  is  45.812234118377674
Co efficient for Transmission_Type_MANUAL  is  -11.211947388437244
Co efficient for Transmission_Type_UNKNOWN  is  -12.371639579413452
Intercept is -232.22076881867463
Traceback (most recent call last):
  File "C:\Users\mridu\OneDrive\Desktop\Pesu_io_ai\Week_2\Prog.py", line 38, in <module>
    print(model.score(X_test, y_test))
  File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 386, in score
    return r2_score(y, self.predict(X), sample_weight=sample_weight,
  File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\linear_model\base.py", line 256, in predict
    return self._decision_function(X)
  File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\linear_model\base.py", line 239, in _decision_function
    X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
  File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 453, in check_array
    _assert_all_finite(array)
  File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 44, in _assert_all_finite
    " or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

修改:The dataset that I'm using    另请注意,我虽然没有使用过几列,但由于我已经更改数据集以删除空行,因此我无法发送源代码。

0 个答案:

没有答案