我使用python确实很新,并且正在使用以下数据框:
data1 = {'Store_ID':['1','1','1','1','2','2','2','3','3'],
'YearMonth':[201801,201802,201805,201904,201812,201902,201906,201904,201907],
'AVG_Rating':[5.0,4.5,4.0,3.5,3.0,4.5,4.0,2.5,4.0]}
df1 = pd.DataFrame(data1)
--------------------AVG_Rating
Store_ID AnoMes
1 201801 5.0
201802 4.5
201805 4.0
201904 3.5
2 201812 3.0
201902 4.5
201906 4.0
3 201904 2.5
201907 4.0
data2 = {'Client_ID':['1212','1234','1122','1230'],
'Store_ID':['1','1','2','3'],
'YearMonth':[201804,201906,201904,201906]}
------------Client_ID---YearMonth
Store_ID
1 1212 201804
1 1234 201906
2 1122 201904
3 1230 201906
我通过Store_ID列将索引设置为两个DF。
我必须合并两者,根据YearMonth列从DF1带来了最近的AVG_Rating,这是客户在商店进行购买的月份。我的最终数据帧必须是:
------- Client_ID ---- YearMonth ----- AVG_Rating
Store_ID
1 1212 201804 4.5(201802评级)
为此,我正在尝试使用以下更多应用功能,但发生错误:
def get_previous_loja_rating(row):
loja = df1[row['Loja_ID']]
lst = loja[loja['AnoMes']] < df2[row['AnoMes']]
return lst[-1]
df2['PREVIOUS_RATING_MEAN'] = df1['AnoMes'].apply(get_previous_loja_rating,axis=1)
KeyError :(“ Loja_ID”,“发生在索引1”)
可以请人帮我吗?
答案 0 :(得分:0)
似乎您正在尝试在代码中使用西班牙语键名(Loja_ID
,AnoMes
等),而数据则使用英语。您将需要将它们更改为Client_ID
和YearMonth
。
答案 1 :(得分:0)
我将使用YearMonth代替AnoMes作为列名。代码功能失败的原因有多种。 据我了解,您希望在平均评级列中添加相应商店最近一年的值。
df1
Store_ID YearMonth AVG_Rating
0 1 201801 5.0
1 1 201802 4.5
2 1 201805 4.0
3 1 201904 3.5
4 2 201812 3.0
df2
Client_ID Store_ID YearMonth
0 1212 1 201804
1 1234 1 201906
2 1122 2 201904
3 1230 3 201906
def get_previous_loja_rating(row):
loja = df1[df1['Store_ID']==row['Store_ID']]
lst = [i for i in loja['YearMonth'] if i <= row['YearMonth']] #list of all yearmonth values less than or equal to client's yearmonth
return df1[(df1['YearMonth']==max(lst))&(df1['Store_ID']==row['Store_ID'])]['AVG_Rating'].iloc[0] # avg rating of the most recent yearmonth
df2['AVG_Rating'] = df2.apply(get_previous_loja_rating,axis=1)
df2
Client_ID Store_ID YearMonth AVG_Rating
0 1212 1 201804 4.5
1 1234 1 201906 3.5
2 1122 2 201904 4.5
3 1230 3 201906 2.5
这会将最接近的年份月份的平均评级纳入您的客户数据框中