我已经写了一个脚本来创建随机森林回归模型。
问题是我的精度和f1测量值都达到1.00,而我所做的更改却没有。更改模型类型,测试大小,数据集中包含的行和列,使其保持不变。
我怀疑我做错了什么。我想知道在什么情况下会发生这种情况。
当前结果:
Report:
precision recall f1-score support
1 1.00 1.00 1.00 1
micro avg 1.00 1.00 1.00 1
macro avg 1.00 1.00 1.00 1
weighted avg 1.00 1.00 1.00 1
Accuracy: 1.0
脚本如下:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn import preprocessing
dataset = pd.read_csv("./Data/Assignment2DataSets/216037514.csv")
print(len(dataset))
dataset["RainTomorrow"] = dataset["RainTomorrow"].astype('category')
dataset["RainTomorrow"] = dataset["RainTomorrow"].cat.codes
dataset.dropna(inplace=True)
dataset = pd.get_dummies(dataset, columns=["Date", "Location", "RainToday", "WindGustDir", "WindDir9am", "WindDir3pm"], prefix=["Date", "Loc", "RTod", "WGD", "WD9am", "WD3pd"])
X = dataset.drop('RainTomorrow', axis=1)
y = dataset['RainTomorrow'] # Line must stay the same
train_test_split(X,y,test_size=0.20, random_state=STUDENTNUMBER)
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.20, random_state=216037514)
classifier = RandomForestRegressor(n_estimators = 100, random_state = 216037514)
classifier.fit(X_train,y_train) # Line must stay the same
y_pred = classifier.predict(X_test) # Line must stay the same
print("Report:\n", classification_report(y_test,y_pred))
print("Accuracy: ", accuracy_score(y_test,y_pred))