我有两个数据框。第一个数据帧是df_states
,第二个数据帧是state_lookup
。
df_states
state code score
0 Texas 0 0.753549
1 Pennsylvania 0 0.998119
2 California 1 0.125751
3 Texas 2 0.125751
state_lookup
state code_0 code_1 code_2
0 Texas 2014 2015 2019
1 Pennsylvania 2015 2016 207
2 California 2014 2015 2019
我想在df_states
中创建一个名为'year'的新列,它基于state_lookup
表的'code'列。因此,例如,如果德克萨斯州的代码= 0,则基于state_lookup
df,年份应为2014。如果德克萨斯州的代码= 2,则年份应为2019。
最终结果应该是这样的:
df_states
state code score year
0 Texas 0 0.753 2014
1 Pennsylvania 0 0.998 2015
2 California 1 0.125 2015
3 Texas 2 0.124 2019
我尝试使用for
循环遍历每一行,但是无法使其正常工作。您将如何实现这一目标?
答案 0 :(得分:2)
您可以首先在wide_to_long
df上使用state_lookup
,以便执行merge
:
s = pd.wide_to_long(state_lookup,stubnames="code",sep="_",i="state",j="year",suffix="\d").reset_index()
s.columns = ["state","code","year"] #rename the columns properly
print (df_states.merge(s, on=["state","code"],how="left"))
state code score year
0 Texas 0 0.753549 2014
1 Pennsylvania 0 0.998119 2015
2 California 1 0.125751 2015
3 Texas 2 0.125751 2019
答案 1 :(得分:1)
加载数据框
df_states = pd.DataFrame({'state':['Texas','Pennsylvania','California','Texas'],'code':[0,0,1,2], 'score':[0.753549,0.998119,0.125751,0.12575]})
state_lookup = pd.DataFrame({'state':['Texas','Pennsylvania','California'],'code_0': [2014,2015,2014],'code_1': [2015,2016,2017] , 'code_2': [2019,2017,2019]})
首先使用melt
将code_
列转换为行
melted_lookup = pd.melt(state_lookup,
id_vars=['state'],
value_vars=[col for col in state_lookup.columns if col.startswith('code_')],
var_name='new_code',
value_name='year')
然后合并两个数据框:
df_states['new_code'] = "code_"+ df_states.code.astype('str')
df_states = pd.merge(df_states, melted_lookup, how = 'left', on =['new_code','state'])
# state code score new_code year
#0 Texas 0 0.753549 code_0 2014
#1 Pennsylvania 0 0.998119 code_0 2015
#2 California 1 0.125751 code_1 2017
#3 Texas 2 0.125750 code_2 2019