从同一数据框熊猫将多列合并为一列

时间:2019-06-03 23:58:15

标签: pandas dataframe merge multiple-columns

我的输入数据集如下,我想将多列重命名为相同的变量名称T1,T2,T3,T4,并将这些列绑定为与一列相同的名称。

df
ID   Q3.4   Q3.6   Q3.8   Q3.18   Q4.4   Q4.6   Q4.8   Q4.12
1    NaN    NaN    NaN    NaN     20     60     80     20
2    10     20     20     40      NaN    NaN    NaN    NaN
3    30     40     40     40      NaN    NaN    NaN    NaN
4    NaN    NaN    NaN    NaN     50     50     50     50

rename vars
T1 = ['Q3.4', 'Q4.4']
T2 = ['Q3.6', 'Q4.6']
T3 = ['Q3.8', 'Q4.8']
T4 = ['Q3.18', 'Q4.12']

第1步:我使用来重命名了变量(请告诉我是否有更快的代码)

df.rename(columns = {'Q3.4': 'T1',
                     'Q4.4': 'T1',
                      inplace = True)

df.rename(columns = {'Q3.6': 'T2',
                     'Q4.6': 'T2',
                      inplace = True)

df.rename(columns = {'Q3.8': 'T3',
                     'Q4.8': 'T3',
                      inplace = True)

df.rename(columns = {'Q3.18': 'T4',
                     'Q4.12': 'T4',
                      inplace = True)

ID   T1   T2   T3   T4   T1   T2   T3   T4
1    NaN  NaN  NaN  NaN  20   60   80   20
2    10   20   20   40   NaN  NaN  NaN  NaN
3    30   40   40   40   NaN  NaN  NaN  NaN
4    NaN  NaN  NaN  NaN  50   50   50   50

如何将这些列合并到以下预期的df中?

ID   T1   T2   T3   T4
1    20   60   80   20
2    10   20   20   40
3    30   40   40   40
4    50   50   50   50

谢谢!

3 个答案:

答案 0 :(得分:1)

从您的原始df开始,groupbyaxis=1

d={'Q3.4': 'T1','Q4.4': 'T1',
   'Q3.6': 'T2','Q4.6': 'T2',
   'Q3.8': 'T3','Q4.8': 'T3',
   'Q3.18': 'T4','Q4.12': 'T4'}
df.set_index('ID').groupby(d,axis=1).first()
Out[80]: 
      T1    T2    T3    T4
ID                        
1   20.0  60.0  80.0  20.0
2   10.0  20.0  20.0  40.0
3   30.0  40.0  40.0  40.0
4   50.0  50.0  50.0  50.0

答案 1 :(得分:0)

如何?

df.sum(level=0, axis=1)


Out[313]:
    ID    T1    T2    T3    T4
0  1.0  20.0  60.0  80.0  20.0
1  2.0  10.0  20.0  20.0  40.0
2  3.0  30.0  40.0  40.0  40.0
3  4.0  50.0  50.0  50.0  50.0

答案 2 :(得分:0)

尝试:

# set index if not already
df = df.set_index('ID')

# stack unstack:
df = df.stack().unstack().reset_index()

输出:

    ID  T1      T2      T3      T4
0   1   20.0    60.0    80.0    20.0
1   2   10.0    20.0    20.0    40.0
2   3   30.0    40.0    40.0    40.0
3   4   50.0    50.0    50.0    50.0