我试图制作一个线性回归模型,用于预测汽车提供的里程数,并提供一些数值。当我尝试使用dataframe.score时,我收到了NaN错误。我的数据集不包含任何空值。请找到代码及其附加的输出。任何帮助表示赞赏。提前谢谢。
import pandas as p
import numpy as np`
d=p.read_csv('cars.csv')`
d=d.drop('Make', axis=1)
d=d.drop('Model', axis=1)
d=d.drop('Engine_Fuel_Type', axis=1)
d=d.drop('Number_of_Doors', axis=1)
d=d.drop('Market_Category', axis=1)
d=d.drop('Vehicle_Size', axis=1)
d=d.drop('Vehicle_Style', axis=1)
d['Transmission_Type']=d['Transmission_Type'].replace({1: 'MANUAL', 2: 'AUTOMATIC', 3: 'AUTOMATED_MANUAL', 4: 'DIRECT_DRIVE', 5: 'UNKNOWN'})
d=p.get_dummies(d, columns=['Transmission_Type'])
print(d.head())
X=d.drop('city_mpg', axis=1)
y=d[['city_mpg']]
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=1)
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
for idx, col_name in enumerate(X_train.columns):
print("Co efficient for", col_name, " is ", model.coef_[0][idx])
intercept = model.intercept_[0]
print("Intercept is", intercept)
print(model.score(X_test, y_test))
我得到的输出是:
Year Engine_HP Engine_Cylinders highway_MPG city_mpg Popularity \
0 2011 335 6.0 26 19 3916
1 2011 300 6.0 28 19 3916
2 2011 300 6.0 28 20 3916
3 2011 230 6.0 28 18 3916
4 2011 230 6.0 28 18 3916
MSRP Transmission_Type_AUTOMATED_MANUAL Transmission_Type_AUTOMATIC \
0 46135 0 0
1 40650 0 0
2 36350 0 0
3 29450 0 0
4 34500 0 0
Transmission_Type_DIRECT_DRIVE Transmission_Type_MANUAL \
0 0 1
1 0 1
2 0 1
3 0 1
4 0 1
Transmission_Type_UNKNOWN
0 0
1 0
2 0
3 0
4 0
Co efficient for Year is 0.12795086619034354
Co efficient for Engine_HP is -0.015142081226758822
Co efficient for Engine_Cylinders is -0.4874611334108649
Co efficient for highway_MPG is 0.4410894679555171
Co efficient for Popularity is 3.102779517592471e-05
Co efficient for MSRP is 8.390933189373478e-06
Co efficient for Transmission_Type_AUTOMATED_MANUAL is -10.972343474157594
Co efficient for Transmission_Type_AUTOMATIC is -11.256303676369456
Co efficient for Transmission_Type_DIRECT_DRIVE is 45.812234118377674
Co efficient for Transmission_Type_MANUAL is -11.211947388437244
Co efficient for Transmission_Type_UNKNOWN is -12.371639579413452
Intercept is -232.22076881867463
Traceback (most recent call last):
File "C:\Users\mridu\OneDrive\Desktop\Pesu_io_ai\Week_2\Prog.py", line 38, in <module>
print(model.score(X_test, y_test))
File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\base.py", line 386, in score
return r2_score(y, self.predict(X), sample_weight=sample_weight,
File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\linear_model\base.py", line 256, in predict
return self._decision_function(X)
File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\linear_model\base.py", line 239, in _decision_function
X = check_array(X, accept_sparse=['csr', 'csc', 'coo'])
File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 453, in check_array
_assert_all_finite(array)
File "C:\Users\mridu\AppData\Local\Programs\Python\Python36\lib\site-packages\sklearn\utils\validation.py", line 44, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').
修改:The dataset that I'm using 另请注意,我虽然没有使用过几列,但由于我已经更改数据集以删除空行,因此我无法发送源代码。