在线性回归程序上调用方法后,Python中的键错误0

时间:2018-11-01 20:08:42

标签: python pandas dataframe

想知道为什么我会收到此错误。该程序本身只是一个基于少量数据集的简单线性回归程序。窥视数据似乎已正确格式化,尽管在我运行该数据时会得到一个关键错误0。真的不确定是什么引起了问题。

    import numpy as np
    import pandas as pd
    import matplotlib.pyplot as plt  
    %matplotlib inline

houses = pd.read_csv('/home/devin/Desktop/machineLearning/houses.csv')
houseData = pd.DataFrame(houses)

#x contains the infor on parameters 
x = houseData.drop('price (grands)', axis = 1)
y = houseData['price (grands)']

def cost_func(x, y, weight, bias):
    xLength = len(x)
    total_error = 0.0
    for i in range(xLength):
        total_error += (y[i] - (weight*x[i] + bias))**2
    return total_error / xLength

def update_weights(x, y, weight, bias, learnRate):
   #initialize derivative values
    weight_deriv = 0
    bias_deriv = 0
    xLength = len(x)
    #calculate partial derivates for our hyperparameters 
    for i in range(xLength):
        # Calculate partial derivatives
        # -2x(y - (mx + b))
        weight_deriv += -2*x[i] * (y[i] - (weight*x[i] + bias))

        # -2(y - (mx + b))
        bias_deriv += -2*(y[i] - (weight*x[i] + bias))


    weight -= (weight_deriv / xLength) * learnRate
    bias -= (bias_deriv / xLength) * learnRate

    return weight, bias

def train(x, y, weight, bias, learnRate, epochs):
    cost_history = []

    for i in range(epochs):
        weight,bias = update_weights(x, y, weight, bias, learnRate)

        #Calculate cost for auditing purposes
        cost = cost_func(x,y,weight,bias)
        cost_history.append(cost)

        # Log Progress
        if i % 10 == 0:
            print ("iter: "+str(i) + " cost: "+str(cost) )

    return list(weight, bias, cost_history)


learnRate = 0.0001
initial_bias = 0 # initial y-intercept guess
initial_weight = 0 # initial slope guess
epochs = 10
print ("Running...")    

result = list(train(x, y, initial_weight, initial_bias, learnRate, epochs))
> Running...
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in
     

get_loc(自身,键,方法,公差)          3077试试:       -> 3078返回self._engine.get_loc(key)          3079除了KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-46-a6b324fbb14b> in <module>()
      7 print ("Running...")
      8 
----> 9 result = list(train(x, y, initial_weight, initial_bias, learnRate, epochs))

<ipython-input-25-932e205a8590> in train(x, y, weight, bias, learnRate, epochs)
      4 
      5     for i in range(epochs):
----> 6         weight,bias = update_weights(x, y, weight, bias, learnRate)
      7 
      8         #Calculate cost for auditing purposes

<ipython-input-6-59d0fff0ef91> in update_weights(x, y, weight, bias, learnRate)
     14         # Calculate partial derivatives
     15         # -2x(y - (mx + b))
---> 16         weight_deriv += -2*x[i] * (y[i] - (weight*x[i] + bias))
     17 
     18         # -2(y - (mx + b))

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2686             return self._getitem_multilevel(key)
   2687         else:
-> 2688             return self._getitem_column(key)
   2689 
   2690     def _getitem_column(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2693         # get column
   2694         if self.columns.is_unique:
-> 2695             return self._get_item_cache(key)
   2696 
   2697         # duplicate columns & possible reduce dimensionality

~/anaconda3/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   2487         res = cache.get(item)
   2488         if res is None:
-> 2489             values = self._data.get(item)
   2490             res = self._box_item_values(item, values)
   2491             cache[item] = res

~/anaconda3/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   4113 
   4114             if not isna(item):
-> 4115                 loc = self.items.get_loc(item)
   4116             else:
   4117                 indexer = np.arange(len(self.items))[isna(self.items)]

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in
     

get_loc(自身,键,方法,公差)          3078返回self._engine.get_loc(key)          3079除了KeyError:       -> 3080返回self._engine.get_loc(self._maybe_cast_indexer(key))          3081          3082 indexer = self.get_indexer([key],method = method,tolerance = tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 0

3 个答案:

答案 0 :(得分:0)

我不知道这是在做什么,但是:

[...]
xLength = len(x)
#calculate partial derivates for our hyperparameters 
for i in range(xLength):
   # Calculate partial derivatives
   # -2x(y - (mx + b))
   weight_deriv += -2*x[i] * (y[i] - (weight*x[i] + bias))

您确定xy的长度相同吗?

weight_deriv += -2*x[i] * (y[i] - (weight*x[i] + bias))

否则,您可能有i中不存在的y ...

答案 1 :(得分:0)

您代码中的i是整数。但是x是数据帧,每列的名称都不同于整数。

我不确定为什么要自己编写代码,但是sklearn库内置了线性回归模块,可以对它们进行更好的优化。

答案 2 :(得分:0)

请注意,type中的xDataFrame;因此,如果您要在行上为x编制索引,可以使用.iloc进行索引。因此,将每个x[i]替换为x.iloc[i]

还有另一个小问题。这行

return list(weight, bias, cost_history)

将引发错误。您可以通过

解决它
return [weight, bias, cost_history]