我想通过使用randomforest来预测用电量。在对数据进行调整之后,最新状态如下
X=df[['Temp(⁰C)','Araç Sayısı (adet)','Montaj V362_WH','Montaj V363_WH','Montaj_Temp','avg_humidity']]
X.head(15)
输出:
Temp(⁰C) Araç Sayısı (adet) Montaj V362_WH Montaj V363_WH Montaj_Temp avg_humidity
0 3.250000 0.0 0.0 0.0 17.500000 88.250000
1 3.500000 868.0 16.0 18.0 20.466667 82.316667
2 3.958333 774.0 18.0 18.0 21.166667 87.533333
3 6.541667 0.0 0.0 0.0 18.900000 83.916667
4 4.666667 785.0 16.0 18.0 20.416667 72.650000
5 2.458333 813.0 18.0 18.0 21.166667 73.983333
6 -0.458333 804.0 16.0 18.0 20.500000 72.150000
7 -1.041667 850.0 16.0 16.0 19.850000 76.433333
8 -0.375000 763.0 16.0 18.0 20.500000 76.583333
9 4.375000 1149.0 16.0 16.0 21.416667 84.300000
10 8.541667 0.0 0.0 0.0 21.916667 71.650000
11 6.625000 763.0 16.0 18.0 22.833333 73.733333
12 5.333333 783.0 16.0 16.0 22.166667 69.250000
13 4.708333 764.0 16.0 18.0 21.583333 66.800000
14 4.208333 813.0 16.0 16.0 20.750000 68.150000
y.head(15)
输出:
Montaj_ET_kWh/day
0 11951.0
1 41821.0
2 42534.0
3 14537.0
4 41305.0
5 42295.0
6 44923.0
7 44279.0
8 45752.0
9 44432.0
10 25786.0
11 42203.0
12 40676.0
13 39980.0
14 39404.0
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.30, random_state=None)
clf = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature in zip(feature_list, clf.feature_importances_):
print(feature)
输出
('Temp(⁰C)', 0.11598075020423881)
('Araç Sayısı (adet)', 0.7047301384616493)
('Montaj V362_WH', 0.04065706901940535)
('Montaj V363_WH', 0.023077554218712878)
('Montaj_Temp', 0.08082006262985514)
('avg_humidity', 0.03473442546613837)
sfm = SelectFromModel(clf, threshold=0.10)
sfm.fit(X_train, y_train['Montaj_ET_kWh/day'])
for feature_list_index in sfm.get_support(indices=True):
print(feature_list[feature_list_index])
输出:
Temp(⁰C)
Araç Sayısı (adet)
X_important_train = sfm.transform(X_train)
X_important_test = sfm.transform(X_test)
clf_important = RandomForestRegressor(n_estimators=10000, random_state=0, n_jobs=-1)
clf_important.fit(X_important_train, y_train)
y_test=y_test.values
y_pred = clf.predict(X_test)
y_test=y_test.reshape(-1,1)
y_pred=y_pred.reshape(-1,1)
y_test=y_test.ravel()
y_pred=y_pred.ravel()
label_encoder = LabelEncoder()
y_pred = label_encoder.fit_transform(y_pred)
y_test = label_encoder.fit_transform(y_test)
accuracy_score(y_test, y_pred)
输出:
0.010964912280701754
我不知道为什么准确性太低,我不知道哪里出错了
答案 0 :(得分:2)
您的错误是您要在回归设置中要求准确性(分类指标),该设置毫无意义。
在accuracy_score
documentation中(添加了重点):
sklearn.metrics.accuracy_score
(y_true,y_pred,normalize = True,sample_weight = None)准确性分类得分。
检查scikit-learn中可用的list of metrics,以获取合适的回归指标(您还可以确认准确性仅用于分类);有关更多详细信息,请参见Accuracy Score ValueError: Can't Handle mix of binary and continuous target
中的答案