我正在开设Coursera课程"机器学习:分类。" 赋值主要使用Sframe,但我尝试使用Pandas来解决赋值。
由于SFrame和Pandas之间的区别,我遇到了一个问题。我能用for循环解决问题;但是,我想知道是否有更简单或有效的方法。此分配用于创建决策树并计算分类错误
原始代码:
def classify(tree, x, annotate = False):
# if the node is a leaf node.
if tree['is_leaf']:
if annotate:
print( "At leaf, predicting %s" % tree['prediction'])
return tree['prediction']
else:
# split on feature.
split_feature_value = x[tree['splitting_feature']]
if annotate:
print ("Split on %s = %s" % (tree['splitting_feature'], split_feature_value))
if split_feature_value == 0:
return classify(tree['left'], x, annotate)
else:
return classify(tree['right'], x, annotate)
def evaluate_classification_error(tree, data):
# Apply the classify(tree, x) to each row in your data
prediction = data.apply(lambda x: classify(tree, x))
# Once you've made the predictions, calculate the classification error and return it
diff = data["safe_loans"] - prediction
return len(diff[diff != 0]) / (float)(len(diff))
在evaluate_classification_error
函数中,它使用lambda函数data.apply(lambda x: classify(tree, x))
来调用classify
函数,该函数使用SFrame命令来选择split_feature_value split_feature_value = x[tree['splitting_feature']]
。
但是,在Pandas中,我需要使用iloc或ix来选择行以获得split_feature_value。所以,我在evaluate_classification_error
函数中使用for循环,如下所示:
我的代码:
def evaluate_classification_error(tree, data, target):
# Apply the classify(tree, x) to each row in your data
predictions = []
for i in range(len(data)):
prediction = classify(tree, data.iloc[i])
predictions.append(prediction)
# Once you've made the predictions, calculate the classification error and return it
error = data[target] - predictions
return len(error[error != 0]) / (float)(len(error))