Question

我正在学习ML课程的Udacity入门课程，因此在练习中遇到一些麻烦（第8课测验12）。

问题如下：

在离群值/outlier_cleaner.py中，您将找到一个名为outlierCleaner（）的函数，您将在其中进行清洁算法。它包含三个参数：预测是以下内容的列表：来自您的回归的预测目标，年龄是训练集的年龄，而net_worths是训练集中的净资产。每个中应包含90个元素这些列表（因为训练集中有90分）。你的工作是要返回一个清单cleaned_data，其中只有81个元素它是81个训练点，其中预测和实际值（net_worths）的误差最小（90 * 0.9 = 81）。 cleaned_data的格式应为元组列表，其中每个元组元组的形式为（年龄，净值，错误）。

这是到目前为止我得到的：

#!/usr/bin/python3

import numpy as np

p_test = np.array([1,2,3,4,50])
a_test = np.array([32,31,44,22,20])
n_test = np.array([2,3,4,5,6])

def outlierCleaner(predictions, ages, net_worths):
    """
        clean away the 10% of points that have the largest
        residual errors (different between the prediction
        and the actual net worth)

        return a list of tuples named cleaned_data where 
        each tuple is of the form (age, net_worth, error)
    """

    cleaned_data = []

    differences = predictions - net_worths

    cleaned_data = zip(ages, net_worths, differences)

    cleaned_data = sorted(cleaned_data, key=lambda x: x[2][0], reverse=True)

    limit = int(len(net_worths)*0.1)

    predictions.sort(axis=0)
    net_worths.sort(axis=0)


    return cleaned_data[limit:]

print(outlierCleaner(p_test,a_test,n_test))

我花了一段时间才弄清楚使用zip()功能。据我所知，这应该是从a_test[]列表中返回包含4个元素的列表。

但是我遇到了要调试的错误：

Traceback (most recent call last):   File "/Users/xxx/Documents/udacity_ml/python3/ud120-projects/outliers/outlier_cleaner.py", line 37, in <module>
    print(outlierCleaner(p_test,a_test,n_test))   File "/Users/xxx/Documents/udacity_ml/python3/ud120-projects/outliers/outlier_cleaner.py", line 25, in outlierCleaner
    cleaned_data = sorted(cleaned_data, key=lambda x: x[2][0], reverse=True)   File "/Users/xxx/Documents/udacity_ml/python3/ud120-projects/outliers/outlier_cleaner.py", line 25, in <lambda>
    cleaned_data = sorted(cleaned_data, key=lambda x: x[2][0], reverse=True) IndexError: invalid index to scalar variable.

努力消除异常值

0 个答案: