在lambda函数中从DataFrame中选择一行

时间:2018-06-06 07:05:00

标签: python pandas lambda graphlab sframe

我正在开设Coursera课程"机器学习:分类。" 赋值主要使用Sframe,但我尝试使用Pandas来解决赋值。

由于SFrame和Pandas之间的区别,我遇到了一个问题。我能用for循环解决问题;但是,我想知道是否有更简单或有效的方法。此分配用于创建决策树并计算分类错误

原始代码:

def classify(tree, x, annotate = False):
   # if the node is a leaf node.
    if tree['is_leaf']:
        if annotate:
            print( "At leaf, predicting %s" % tree['prediction'])
        return tree['prediction']
    else:
        # split on feature.
        split_feature_value = x[tree['splitting_feature']]
        if annotate:
            print ("Split on %s = %s" % (tree['splitting_feature'], split_feature_value))
        if split_feature_value == 0:
            return classify(tree['left'], x, annotate)
        else:
            return classify(tree['right'], x, annotate)

def evaluate_classification_error(tree, data):
    # Apply the classify(tree, x) to each row in your data
    prediction = data.apply(lambda x: classify(tree, x))

    # Once you've made the predictions, calculate the classification error and return it
    diff = data["safe_loans"] - prediction
    return len(diff[diff != 0]) / (float)(len(diff))

evaluate_classification_error函数中,它使用lambda函数data.apply(lambda x: classify(tree, x))来调用classify函数,该函数使用SFrame命令来选择split_feature_value split_feature_value = x[tree['splitting_feature']]

但是,在Pandas中,我需要使用iloc或ix来选择行以获得split_feature_value。所以,我在evaluate_classification_error函数中使用for循环,如下所示:

我的代码:

def evaluate_classification_error(tree, data, target):
    # Apply the classify(tree, x) to each row in your data
    predictions = []
    for i in range(len(data)):
        prediction = classify(tree, data.iloc[i])
        predictions.append(prediction)
    # Once you've made the predictions, calculate the classification error and return it
    error = data[target] - predictions
    return len(error[error != 0]) / (float)(len(error))

0 个答案:

没有答案