使用其他数据框列值的Pandas数据框列扩展

时间:2019-06-22 12:16:29

标签: python-3.x pandas

我有两个数据帧(df1和df2),如下所示:

In [4]:df1
   Year  Annual Counts
0  1979          45345
1  1980          15381
2  1981          32171
3  1982          30288
4  1983          50573
In [5]:df2
   Year  CanESM2  GFDL-ESM2M  HadGEM2-ES365  IPSL-CM5A-MR  NorESM1-M
0  1984    10645       48143          57366         26979      37603
1  1985    15918       17178          34617         21304      31956
2  1986    51790       44111          50017         29233      61203
3  1987    34039       14504          23136         35848      34688
4  1988    68641       67681          24322         39591      34553

,我想按如下方式合并两个数据框:

   Year  CanESM2  GFDL-ESM2M  HadGEM2-ES365  IPSL-CM5A-MR  NorESM1-M
0  1979    45345       45345          45345         45345      45345
1  1980    15381       15381          15381         15381      15381
2  1981    32171       32171          32171         32171      32171
3  1982    30288       30288          30288         30288      30288
4  1983    50573       50573          50573         50573      50573 
5  1984    10645       48143          57366         26979      37603
6  1985    15918       17178          34617         21304      31956
7  1986    51790       44111          50017         29233      61203
8  1987    34039       14504          23136         35848      34688
9  1988    68641       67681          24322         39591      34553

我有一个简单的解决方案:

df1 = pd.DataFrame(file1)
df1_list = df1['Annual Counts'].tolist()
# empty lists
ext1=[] ; ext2=[] ; ext3=[] ; ext4=[] ; ext5=[]
df2 = pd.DataFrame(file2)
models = ['CanESM2','GFDL-ESM2M','HadGEM2-ES365','IPSL-CM5A-MR','NorESM1-M']
for idx,m in enumerate(models):
    ext+str(idx).append(df1_list)
    df2_mod = df2[m].tolist()
    ext+str(idx).extend(df2_mod)

有没有建议,熊猫是否具有执行此任务的功能,而无需创建多个列表然后扩展它们?

2 个答案:

答案 0 :(得分:2)

这是一种方法: 将列Annual Counts重命名为CanESM2,然后在将YearCanESM2设置为索引之后,最后在axis=1combine_first上使用ffill()

(df1.rename(columns={'Annual Counts':'CanESM2'})
.set_index(['Year','CanESM2']).combine_first(df2.set_index(['Year','CanESM2']))
.reset_index().ffill(axis=1))

使用merge的另一种方法:

(df1.rename(columns={'Annual Counts':'CanESM2'})
   .merge(df2,how='outer',on=['Year','CanESM2']).ffill(axis=1))

     Year  CanESM2  GFDL-ESM2M  HadGEM2-ES365  IPSL-CM5A-MR  NorESM1-M
0  1979.0  45345.0     45345.0        45345.0       45345.0    45345.0
1  1980.0  15381.0     15381.0        15381.0       15381.0    15381.0
2  1981.0  32171.0     32171.0        32171.0       32171.0    32171.0
3  1982.0  30288.0     30288.0        30288.0       30288.0    30288.0
4  1983.0  50573.0     50573.0        50573.0       50573.0    50573.0
5  1984.0  10645.0     48143.0        57366.0       26979.0    37603.0
6  1985.0  15918.0     17178.0        34617.0       21304.0    31956.0
7  1986.0  51790.0     44111.0        50017.0       29233.0    61203.0
8  1987.0  34039.0     14504.0        23136.0       35848.0    34688.0
9  1988.0  68641.0     67681.0        24322.0       39591.0    34553.0

答案 1 :(得分:1)

与anky_91答案相同,但重命名列,然后在此处使用concat并在ffill上进行前填充(axis=1):

pd.concat([df1.rename(columns={'Annual Counts':'CanESM2'}), df2], 
           ignore_index=True, 
           sort=False).ffill(axis=1)

输出:

     Year  CanESM2  GFDL-ESM2M  HadGEM2-ES365  IPSL-CM5A-MR  NorESM1-M
0  1979.0  45345.0     45345.0        45345.0       45345.0    45345.0
1  1980.0  15381.0     15381.0        15381.0       15381.0    15381.0
2  1981.0  32171.0     32171.0        32171.0       32171.0    32171.0
3  1982.0  30288.0     30288.0        30288.0       30288.0    30288.0
4  1983.0  50573.0     50573.0        50573.0       50573.0    50573.0
5  1984.0  10645.0     48143.0        57366.0       26979.0    37603.0
6  1985.0  15918.0     17178.0        34617.0       21304.0    31956.0
7  1986.0  51790.0     44111.0        50017.0       29233.0    61203.0
8  1987.0  34039.0     14504.0        23136.0       35848.0    34688.0
9  1988.0  68641.0     67681.0        24322.0       39591.0    34553.0