Question

我使用python确实很新，并且正在使用以下数据框：

    data1 = {'Store_ID':['1','1','1','1','2','2','2','3','3'],
             'YearMonth':[201801,201802,201805,201904,201812,201902,201906,201904,201907],
             'AVG_Rating':[5.0,4.5,4.0,3.5,3.0,4.5,4.0,2.5,4.0]}

    df1 = pd.DataFrame(data1)

--------------------AVG_Rating
Store_ID    AnoMes  
1           201801  5.0
            201802  4.5
            201805  4.0
            201904  3.5
2           201812  3.0
            201902  4.5
            201906  4.0
3           201904  2.5
            201907  4.0

    data2 = {'Client_ID':['1212','1234','1122','1230'],
             'Store_ID':['1','1','2','3'],
             'YearMonth':[201804,201906,201904,201906]}

------------Client_ID---YearMonth
Store_ID        
1           1212        201804
1           1234        201906
2           1122        201904
3           1230        201906

我通过Store_ID列将索引设置为两个DF。

我必须合并两者，根据YearMonth列从DF1带来了最近的AVG_Rating，这是客户在商店进行购买的月份。我的最终数据帧必须是：

------- Client_ID ---- YearMonth ----- AVG_Rating Store_ID
1 1212 201804 4.5（201802评级）

为此，我正在尝试使用以下更多应用功能，但发生错误：

    def get_previous_loja_rating(row):
        loja = df1[row['Loja_ID']]
        lst = loja[loja['AnoMes']] < df2[row['AnoMes']]
        return lst[-1]

    df2['PREVIOUS_RATING_MEAN'] = df1['AnoMes'].apply(get_previous_loja_rating,axis=1)

KeyError ：（“ Loja_ID”，“发生在索引1”）

可以请人帮我吗？

Answer 1

似乎您正在尝试在代码中使用西班牙语键名（Loja_ID，AnoMes等），而数据则使用英语。您将需要将它们更改为Client_ID和YearMonth。

Answer 2

我将使用YearMonth代替AnoMes作为列名。代码功能失败的原因有多种。据我了解，您希望在平均评级列中添加相应商店最近一年的值。

df1
Store_ID    YearMonth   AVG_Rating
0   1   201801  5.0
1   1   201802  4.5
2   1   201805  4.0
3   1   201904  3.5
4   2   201812  3.0
df2
Client_ID   Store_ID    YearMonth
0   1212    1   201804
1   1234    1   201906
2   1122    2   201904
3   1230    3   201906


def get_previous_loja_rating(row):
    loja = df1[df1['Store_ID']==row['Store_ID']]
    lst = [i for i in loja['YearMonth'] if i <= row['YearMonth']] #list of all yearmonth values less than or equal to client's yearmonth
    return df1[(df1['YearMonth']==max(lst))&(df1['Store_ID']==row['Store_ID'])]['AVG_Rating'].iloc[0] # avg rating of the most recent yearmonth

df2['AVG_Rating'] = df2.apply(get_previous_loja_rating,axis=1)

df2
Client_ID   Store_ID    YearMonth   AVG_Rating
0   1212    1   201804  4.5
1   1234    1   201906  3.5
2   1122    2   201904  4.5
3   1230    3   201906  2.5

这会将最接近的年份月份的平均评级纳入您的客户数据框中

KeyError ：（“ 1”，“发生在索引0”）

2 个答案: