Question

有关我的目标的详细信息：我正在使用IMDb数据和Youtube电影预告片数据来预测电影的总收入。具体来说，我使用“范围”，“总收入”，“预算”，“ imdb_score”，“观看次数”，“评分”进行分类。

因此，我遍历了KNN和决策树1）来预测电影的成功，并且还进行了2）知道哪种机器学习会更准确。

但是问题是，我的knn结果是

feature_columns = ['range', 'gross', 'budget','imdb_score','views','rating']
X = df2[feature_columns].values
y = df2['range'].values

knn = KNeighborsClassifier(n_neighbors=3, metric='euclidean')
knn.fit(X_train, y_train)

confusion_matrix(y_test, y_pred)

最终得出的模型精度为94.0％。与

array([[ 4,  0,  0,  0],
       [ 0, 27,  0,  0],
       [ 0,  1, 20,  0],
       [ 0,  0,  1,  7]])

KNN pairplot result

对于决策树，

feature_cols = ["budget","imdb_score","views","rating"]
y = df2.range
X = df2[feature_cols]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)

clf = DecisionTreeClassifier(criterion="entropy") #Default criterion is gini index, If you want to use gini index just delete criterion

clf = clf.fit(X_train,y_train) #Training

y_pred = clf.predict(X_test) #Make a prediction

print("Accuracy:",metrics.accuracy_score(y_test, y_pred))

最终准确度：0.48333333333333334

dot_data = StringIO()
export_graphviz(clf, out_file=dot_data,  
                filled=True, rounded=True,
                special_characters=True,feature_names = feature_cols,class_names=["0","1","2","3"]) ##Change class i named 0, and 1 since i only have to class
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())  
graph.write_png('house.png') #will saved to your default location. in coolab it is content folder 
Image(graph.create_png())

Decision Tree result

我的问题是， 1）我的成绩反映了我的主题吗？ KNN准确性为94％，决策树为48％的事实令人困惑。我认为决策树的准确性会更高。 2）我尤其不确定KNN和决策树的列功能是否相同，以反映相同的结果。我的功能是否进行了不同的编辑？

感谢您的阅读！

Answer 1

在KNN中，您也会在训练数据中提供目标列 range ，因此与决策树相比，该模型的准确性非常高。

您错过了决策树训练数据中的主要特征。因此，您不应该比较这两种算法的性能，因为它们的训练数据不同。请针对这两种算法再次尝试使用相同的训练数据

注意：请注明总和范围是什么意思

如何解释knn和决策树的结果？

1 个答案: