如何将新列与长度较短且索引不同的另一个列进行比较,从而将其添加到现有数据框中。
例如,如果我有:
df1 = country code year
0 Armenia a 2016
1 Brazil b 2017
2 Turkey c 2016
3 Armenia d 2017
df2 = geoCountry 2016_gdp 2017_gdp
0 Armenia 10.499 10.74
1 Brazil 1,798.62 2,140.94
2 Turkey 857.429 793.698
最后我要结束:
df1 = country code year gdp
0 Armenia a 2016 10.499
1 Brazil b 2017 2,140.94
2 Turkey c 2016 857.429
3 Armenia d 2017 10.74
我将如何处理?我试图使用概述here和here的答案无济于事。我还进行了以下操作,这在90000行数据帧上花费的时间太长了
for index, row in df1.iterrows():
if row['country'] in list(df2.geoCountry):
if row['year'] == 2016:
df1['gdp'].append(df2[df2.geoCountry == str(row['country'])]['2016'])
else:
df1['gdp'].append(df2[df2.geoCountry == str(row['country'])]['2017'])
答案 0 :(得分:0)
我想这就是您要寻找的东西
df2 = df2.melt(id_vars = 'geoCountry', value_vars = ['2016_gdp', '2017_gdp'], var_name = ['year'])
df1['year'] = df1['year'].astype('int')
df2['year'] = df2['year'].str.slice(0,4).astype('int')
df1.merge(df2, left_on = ['country','year'], right_on = ['geoCountry','year'])[['country', 'code', 'year', 'value']]
输出:
country code year value
0 Armenia a 2016 10.499
1 Brazil b 2017 2,140.94
2 Turkey c 2016 857.429
3 Armenia d 2017 10.74
答案 1 :(得分:0)
您主要需要融化功能:
df2.columns = df2.columns.str.split("_").str.get(0)
df2 = df2.rename(index=str, columns={"geoCountry": "country"})
df3 = pd.melt(df2, id_vars=['geoCountry'], value_vars=['2016','2017'],
var_name='year', value_name='gdp')
在此之后,您只需将df1与上述df3合并
result = pd.merge(df1, df3, on=['country','year'])
输出:
pd.merge(df1, df3, on=['country','year'])
Out[36]:
country code year gdp
0 Armenia a 2016 10.499
1 Brazil b 2017 2140.940
2 Turkey c 2016 857.429
3 Armenia d 2017 10.740