在python中合并多个表后更改重复的列名

时间:2019-12-03 18:17:55

标签: python pandas numpy dataframe merge

我已经将4个文件合并为一个文件。

// config/webpack/development.js

const environment = require('./environment.js');

environment.config.merge({
  devServer: {
    watchOptions: {
      poll: process.env.WEBPACK_DEV_SERVER_WATCH_POLL,
      aggregateTimeout: process.env.WEBPACK_DEV_SERVER_WATCH_TIMEOUT
    }
  }
});

module.exports = environment.toWebpackConfig();

这是我合并这些表的方式。

df1:
ID   name    location   case     pass
1    John      NY       tax       Y
2    Jack      NJ       payment   N
3    John      CA       remote    Y
4    Rose      MA       income    Y
df2:
ID   name    location   case   pass
1    John      NY       car     N
2    Jack      NJ       train   Y
3    John      CA       car     Y
4    Rose      MA       bike    N
df3:
ID   name    location   case     pass
1    John      NY       spring    Y
2    Jack      NJ       spring    Y
3    John      CA       fall      Y
4    Rose      MA       winter    N
df4:
ID   name    location   case    pass
1    John      NY       red      N
2    Jack      NJ       green    N
3    John      CA       yellow   Y
4    Rose      MA       yellow   Y

但是结果有点难以阅读。我需要将那些dfs = [df1,df2,df3,df4] df_final = reduce(lambda left,right: pd.merge(left,right,on=[ID,name,location]), dfs) 转换为特定的列名。合并表格时可以这样做吗?

case_x,case_y,pass_x,pass_y

这是我的预期输出,

 ID   name    location     case_x  pass_x  case_y      pass_y   case_x      pass_x  case_y   pass_y
    1    John      NY       tax       Y      car       N        spring      Y       red      N
    2    Jack      NJ       payment   N      train     Y        spring      Y      green     N
    3    John      CA       remote    Y      car       Y        fall        Y      yellow    Y 
    4    Rose      MA       income    Y      bike      N        winter      N      yellow    Y  

2 个答案:

答案 0 :(得分:2)

我使用concatpivot_table的方法:

names = ['money', 'trans', 'season', 'color']
dfs = [df1,df2,df3,df4]

new_df = (pd.concat(d.assign(name=n) for n,d in zip(names, dfs))
            .pivot_table(index=['ID','location', 'location'],
                         columns='name',
                         values=['case','pass'],
                         aggfunc='first')
         )
new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]

答案 1 :(得分:1)

通过reduce选项和列表suffixes仍然可以使用pop

suff = ['_trans', '_season', '_color']
dfs = [df1,df2,df3,df4]
df_final = reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)

Out[1944]:
   ID  name location     case pass case_trans pass_trans case_season  \
0  1   John  NY       tax      Y    car        N          spring
1  2   Jack  NJ       payment  N    train      Y          spring
2  3   John  CA       remote   Y    car        Y          fall
3  4   Rose  MA       income   Y    bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

注意:请小心列表suff。您需要在重新运行代码之前重新启动它。


如果您想将第一个casepass重命名为_money,只需链接其他rename

df_final = (reduce(lambda left,right: pd.merge(left,right,on=['ID','name','location'], 
                                          suffixes=('', suff.pop(0))), dfs)
                 .rename({'case': 'case_money', 'pass': 'pass_money'}, axis=1))

Out[1951]:
   ID  name location case_money pass_money case_trans pass_trans case_season  \
0  1   John  NY       tax        Y          car        N          spring
1  2   Jack  NJ       payment    N          train      Y          spring
2  3   John  CA       remote     Y          car        Y          fall
3  4   Rose  MA       income     Y          bike       N          winter

  pass_season case_color pass_color
0  Y           red        N
1  Y           green      N
2  Y           yellow     Y
3  N           yellow     Y

通过这种方式,您只需要重命名第一组case, passcase, passsuffixes已经命名了merge的所有其他集合