如何在每次索引重新开始时将数据帧拆分为多个块并水平合并

时间:2019-11-17 21:51:41

标签: python pandas dataframe

我有一个数据框,实际上是垂直连接的多个数据框。我想水平合并它们,但是在按索引值进行拆分时遇到了麻烦。我想在索引为“ finish”的地方开始一个新块,并且我想避免手动执行此操作,因为实际数据帧有大约20个部分,每个部分的长度都不相同。

这是原始数据框。

        12_boys  12_girls
finish                   
1        Team A    Team A
2        Team B    Team B
3        Team C    Team C
4        Team D    Team D
5        Team E    Team E
finish  14_boys  14_girls
1        Team A    Team A
2        Team B    Team B
3        Team C    Team C
4        Team D    Team D
finish  16_boys  16_girls
1        Team A    Team A
2        Team B    Team B
3        Team C    Team C
4        Team D    Team D

这就是我想要的。

       12_boys 14_boys 16_boys 12_girls 14_girls 16_girls
finish                                                   
1       Team A  Team A  Team A   Team A   Team A   Team A
2       Team B  Team B  Team B   Team B   Team B   Team B
3       Team C  Team C  Team C   Team C   Team C   Team C
4       Team D  Team D  Team D   Team D   Team D   Team D
5       Team E     NaN     NaN   Team E      NaN      NaN

我能想到的最接近的方法是手动拆分和合并,但这不会转置列名。

data1 = data.iloc[0:6]
data2 = data.iloc[6:10]
data3 = data.iloc[11:15]
data_merge = pd.merge(data1, data2, on='finish', how='outer')
data_merge = pd.merge(data_merge, data3, on='finish', how='outer')

Output:

    12_boys_x   12_girls_x  12_boys_y   12_girls_y  12_boys     12_girls
finish                      
1   Team A  Team A  Team A  Team A  Team A  Team A
2   Team B  Team B  Team B  Team B  Team B  Team B
3   Team C  Team C  Team C  Team C  Team C  Team C
4   Team D  Team D  Team D  Team D  Team D  Team D
5   Team E  Team E  NaN     NaN     NaN     NaN
finish  14_boys     14_girls    NaN     NaN     NaN     NaN

1 个答案:

答案 0 :(得分:1)

我们首先可以通过检查索引更改为finish的位置来识别组。然后,我们使用GroupBypd.concat来使每个组彼此相邻:

grps = (df.index == 'finish').cumsum()

dfs = []
for grp, dfg in df.groupby(grps):
    if grp != 0:
        dfg.columns = dfg.iloc[0].values
        dfs.append(dfg)
    else: dfs.append(dfg)

df_new = pd.concat(dfs, axis=1, sort=False).iloc[:-1]
  12_boys 12_girls 14_boys 14_girls 16_boys 16_girls
1  Team A   Team A  Team A   Team A  Team A   Team A
2  Team B   Team B  Team B   Team B  Team B   Team B
3  Team C   Team C  Team C   Team C  Team C   Team C
4  Team D   Team D  Team D   Team D  Team D   Team D
5  Team E   Team E     NaN      NaN     NaN      NaN