Pandas数据框中的列的KeyError

时间:2019-11-16 22:46:58

标签: python pandas

我遇到了一个我似乎无法理解的问题。我编写了一个函数,该函数将一个数据框作为输入,然后对其执行许多清理步骤。运行该函数时,出现错误消息KeyError: ('amount', 'occurred at index date')。这对我来说没有意义,因为amount是我数据框中的一列。

以下是一些代码,其中包含创建的数据的一部分:

data = pd.DataFrame.from_dict({"date": ["10/31/2019","10/27/2019"], "amount": [-13.3, -6421.25], "vendor": ["publix","verizon"]})

#create cleaning function for dataframe
def cleaning_func(x):

    #convert the amounts to positive numbers
    x['amount'] =  x['amount'] * -1

    #convert dates to datetime for subsetting purposes
    x['date'] = pd.to_datetime(x['date'])

    #begin removing certain strings
    x['vendor'] = x['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
    x['vendor'] = x['vendor'].str.replace("[0-9]","")
    x['vendor'] = x['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")

    #build table of punctuation and remove from vendor strings
    table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
    x['vendor'] = x['vendor'].str.translate(table)

    return x
clean_data = data.apply(cleaning_func)

如果有人可以弄清为什么会出现此错误,我将不胜感激。

1 个答案:

答案 0 :(得分:1)

请不要在此处使用apply,它的速度很慢,并且基本上会遍历您的数据框。只需将数据传递给函数,然后让它返回清理的数据帧,这样它将在整个列中使用矢量化方法。

def cleaning_func(df):

    #convert the amounts to positive numbers
    df['amount'] =  df['amount'] * -1

    #convert dates to datetime for subsetting purposes
    df['date'] = pd.to_datetime(df['date'])

    #begin removing certain strings
    df['vendor'] = df['vendor'].str.replace("PURCHASE AUTHORIZED ON ","")
    df['vendor'] = df['vendor'].str.replace("[0-9]","")
    df['vendor'] = df['vendor'].str.replace("PURCHASE WITH CASH BACK $ . AUTHORIZED ON /","")

    #build table of punctuation and remove from vendor strings
    table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
    df['vendor'] = df['vendor'].str.translate(table)

    return df

clean_df = cleaning_func(data)