我正在尝试一个ML示例并且它在大多数情况下都有效但是当我连续运行代码时,python开始吐出不同的预测结果,现在我现在是ML专家,但这似乎很难?
# Example file from Google Developers: "Hello World - Machine Learning Recipes": YouTube: https://youtu.be/cKxRvEZd3Mw
# Category: Supervised Learning
# January 14, 2018
from sklearn import tree
# Declarations: Texture
bumpy = 0
smooth = 1
# Declarations: Labels
apple = 0
orange = 1
# Step(1): Collect training data
# Features: [Weight, Texture]
features = [[140, smooth], [130, smooth], [150, bumpy], [170, bumpy]]
# labels will be used as the index for the features
labels = [apple, apple, orange, orange]
# Step(2): Train Classifier: Decision Tree
# Use the decision tree object and then fit 'find' paterns in features and labels
clf = tree.DecisionTreeClassifier()
clf = clf.fit(features, labels)
# Step(3): Make Predictions
# the prdict method will return the best fit from the decesion tree
result = clf.predict([[150, bumpy], [130, smooth], [125.5, bumpy], [110, smooth]])
# result = clf.predict([[150, bumpy]])
print("Step(3): Make Predictions: ")
for x in result:
if x == 0:
print("Apple")
continue
elif x == 1:
print("Orange")
continue
print("Orange")
答案 0 :(得分:6)
对于(大多数?)决策树算法来说,有一个随机元素,你的训练集非常小,可能会夸大效果。随机性通常用于确定使用多少/哪些样本,在您的情况下,样本非常少。
创建DecisionTreeClassifier
时,请尝试将random_state
设置为某个固定的整数。如果您想要一个可重复的测试结果,您需要使用相同的种子"每次都有价值。他们在示例文档中使用零随机种子:
clf = DecisionTreeClassifier(random_state=0)