我有一个数据框,实际上是垂直连接的多个数据框。我想水平合并它们,但是在按索引值进行拆分时遇到了麻烦。我想在索引为“ finish”的地方开始一个新块,并且我想避免手动执行此操作,因为实际数据帧有大约20个部分,每个部分的长度都不相同。
这是原始数据框。
12_boys 12_girls
finish
1 Team A Team A
2 Team B Team B
3 Team C Team C
4 Team D Team D
5 Team E Team E
finish 14_boys 14_girls
1 Team A Team A
2 Team B Team B
3 Team C Team C
4 Team D Team D
finish 16_boys 16_girls
1 Team A Team A
2 Team B Team B
3 Team C Team C
4 Team D Team D
这就是我想要的。
12_boys 14_boys 16_boys 12_girls 14_girls 16_girls
finish
1 Team A Team A Team A Team A Team A Team A
2 Team B Team B Team B Team B Team B Team B
3 Team C Team C Team C Team C Team C Team C
4 Team D Team D Team D Team D Team D Team D
5 Team E NaN NaN Team E NaN NaN
我能想到的最接近的方法是手动拆分和合并,但这不会转置列名。
data1 = data.iloc[0:6]
data2 = data.iloc[6:10]
data3 = data.iloc[11:15]
data_merge = pd.merge(data1, data2, on='finish', how='outer')
data_merge = pd.merge(data_merge, data3, on='finish', how='outer')
Output:
12_boys_x 12_girls_x 12_boys_y 12_girls_y 12_boys 12_girls
finish
1 Team A Team A Team A Team A Team A Team A
2 Team B Team B Team B Team B Team B Team B
3 Team C Team C Team C Team C Team C Team C
4 Team D Team D Team D Team D Team D Team D
5 Team E Team E NaN NaN NaN NaN
finish 14_boys 14_girls NaN NaN NaN NaN
答案 0 :(得分:1)
我们首先可以通过检查索引更改为finish
的位置来识别组。然后,我们使用GroupBy
和pd.concat
来使每个组彼此相邻:
grps = (df.index == 'finish').cumsum()
dfs = []
for grp, dfg in df.groupby(grps):
if grp != 0:
dfg.columns = dfg.iloc[0].values
dfs.append(dfg)
else: dfs.append(dfg)
df_new = pd.concat(dfs, axis=1, sort=False).iloc[:-1]
12_boys 12_girls 14_boys 14_girls 16_boys 16_girls
1 Team A Team A Team A Team A Team A Team A
2 Team B Team B Team B Team B Team B Team B
3 Team C Team C Team C Team C Team C Team C
4 Team D Team D Team D Team D Team D Team D
5 Team E Team E NaN NaN NaN NaN