我有两个数据帧(df1和df2),如下所示:
In [4]:df1
Year Annual Counts
0 1979 45345
1 1980 15381
2 1981 32171
3 1982 30288
4 1983 50573
In [5]:df2
Year CanESM2 GFDL-ESM2M HadGEM2-ES365 IPSL-CM5A-MR NorESM1-M
0 1984 10645 48143 57366 26979 37603
1 1985 15918 17178 34617 21304 31956
2 1986 51790 44111 50017 29233 61203
3 1987 34039 14504 23136 35848 34688
4 1988 68641 67681 24322 39591 34553
,我想按如下方式合并两个数据框:
Year CanESM2 GFDL-ESM2M HadGEM2-ES365 IPSL-CM5A-MR NorESM1-M
0 1979 45345 45345 45345 45345 45345
1 1980 15381 15381 15381 15381 15381
2 1981 32171 32171 32171 32171 32171
3 1982 30288 30288 30288 30288 30288
4 1983 50573 50573 50573 50573 50573
5 1984 10645 48143 57366 26979 37603
6 1985 15918 17178 34617 21304 31956
7 1986 51790 44111 50017 29233 61203
8 1987 34039 14504 23136 35848 34688
9 1988 68641 67681 24322 39591 34553
我有一个简单的解决方案:
df1 = pd.DataFrame(file1)
df1_list = df1['Annual Counts'].tolist()
# empty lists
ext1=[] ; ext2=[] ; ext3=[] ; ext4=[] ; ext5=[]
df2 = pd.DataFrame(file2)
models = ['CanESM2','GFDL-ESM2M','HadGEM2-ES365','IPSL-CM5A-MR','NorESM1-M']
for idx,m in enumerate(models):
ext+str(idx).append(df1_list)
df2_mod = df2[m].tolist()
ext+str(idx).extend(df2_mod)
有没有建议,熊猫是否具有执行此任务的功能,而无需创建多个列表然后扩展它们?
答案 0 :(得分:2)
这是一种方法:
将列Annual Counts
重命名为CanESM2
,然后在将Year
和CanESM2
设置为索引之后,最后在axis=1
上combine_first
上使用ffill()
(df1.rename(columns={'Annual Counts':'CanESM2'})
.set_index(['Year','CanESM2']).combine_first(df2.set_index(['Year','CanESM2']))
.reset_index().ffill(axis=1))
使用merge
的另一种方法:
(df1.rename(columns={'Annual Counts':'CanESM2'})
.merge(df2,how='outer',on=['Year','CanESM2']).ffill(axis=1))
Year CanESM2 GFDL-ESM2M HadGEM2-ES365 IPSL-CM5A-MR NorESM1-M
0 1979.0 45345.0 45345.0 45345.0 45345.0 45345.0
1 1980.0 15381.0 15381.0 15381.0 15381.0 15381.0
2 1981.0 32171.0 32171.0 32171.0 32171.0 32171.0
3 1982.0 30288.0 30288.0 30288.0 30288.0 30288.0
4 1983.0 50573.0 50573.0 50573.0 50573.0 50573.0
5 1984.0 10645.0 48143.0 57366.0 26979.0 37603.0
6 1985.0 15918.0 17178.0 34617.0 21304.0 31956.0
7 1986.0 51790.0 44111.0 50017.0 29233.0 61203.0
8 1987.0 34039.0 14504.0 23136.0 35848.0 34688.0
9 1988.0 68641.0 67681.0 24322.0 39591.0 34553.0
答案 1 :(得分:1)
与anky_91答案相同,但重命名列,然后在此处使用concat
并在ffill
上进行前填充(axis=1
):
pd.concat([df1.rename(columns={'Annual Counts':'CanESM2'}), df2],
ignore_index=True,
sort=False).ffill(axis=1)
输出:
Year CanESM2 GFDL-ESM2M HadGEM2-ES365 IPSL-CM5A-MR NorESM1-M
0 1979.0 45345.0 45345.0 45345.0 45345.0 45345.0
1 1980.0 15381.0 15381.0 15381.0 15381.0 15381.0
2 1981.0 32171.0 32171.0 32171.0 32171.0 32171.0
3 1982.0 30288.0 30288.0 30288.0 30288.0 30288.0
4 1983.0 50573.0 50573.0 50573.0 50573.0 50573.0
5 1984.0 10645.0 48143.0 57366.0 26979.0 37603.0
6 1985.0 15918.0 17178.0 34617.0 21304.0 31956.0
7 1986.0 51790.0 44111.0 50017.0 29233.0 61203.0
8 1987.0 34039.0 14504.0 23136.0 35848.0 34688.0
9 1988.0 68641.0 67681.0 24322.0 39591.0 34553.0