Question

我正在开设Coursera课程＆＃34;机器学习：分类。＆＃34; 赋值主要使用Sframe，但我尝试使用Pandas来解决赋值。

由于SFrame和Pandas之间的区别，我遇到了一个问题。我能用for循环解决问题;但是，我想知道是否有更简单或有效的方法。此分配用于创建决策树并计算分类错误

原始代码：

def classify(tree, x, annotate = False):
   # if the node is a leaf node.
    if tree['is_leaf']:
        if annotate:
            print( "At leaf, predicting %s" % tree['prediction'])
        return tree['prediction']
    else:
        # split on feature.
        split_feature_value = x[tree['splitting_feature']]
        if annotate:
            print ("Split on %s = %s" % (tree['splitting_feature'], split_feature_value))
        if split_feature_value == 0:
            return classify(tree['left'], x, annotate)
        else:
            return classify(tree['right'], x, annotate)

def evaluate_classification_error(tree, data):
    # Apply the classify(tree, x) to each row in your data
    prediction = data.apply(lambda x: classify(tree, x))

    # Once you've made the predictions, calculate the classification error and return it
    diff = data["safe_loans"] - prediction
    return len(diff[diff != 0]) / (float)(len(diff))

在evaluate_classification_error函数中，它使用lambda函数data.apply(lambda x: classify(tree, x))来调用classify函数，该函数使用SFrame命令来选择split_feature_value split_feature_value = x[tree['splitting_feature']]。

但是，在Pandas中，我需要使用iloc或ix来选择行以获得split_feature_value。所以，我在evaluate_classification_error函数中使用for循环，如下所示：

我的代码：

def evaluate_classification_error(tree, data, target):
    # Apply the classify(tree, x) to each row in your data
    predictions = []
    for i in range(len(data)):
        prediction = classify(tree, data.iloc[i])
        predictions.append(prediction)
    # Once you've made the predictions, calculate the classification error and return it
    error = data[target] - predictions
    return len(error[error != 0]) / (float)(len(error))

在lambda函数中从DataFrame中选择一行

0 个答案: