拆分和旋转数据框

时间:2015-08-04 18:33:53

标签: python-2.7 pandas pivot dataframe pivot-table

我有一个带有以下值的pandas数据框:

df1 = pd.DataFrame([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [2000, 2000, 2000, 5000, 2000, 5000, 2000, 5000, 2000, 5000], [0, 3, 0, 3, 0, 3, 0, 3, 0, 3], [233, 233, 96, 96, 53, 53, 29, 29, 24, 24], [251.109065, 251.109065, 300.141548, 412.916402, 291.836682, 394.260558, 327.351227, 478.924355, 371.598847, 574.811102], [18.858343, 18.858343, 67.874851, -127.405555, 58.692756, -148.001670, 95.252774, -63.949017, 136.983014, 26.888185]]).T


df1.columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']

df1 

   col1  col2  col3  col4  col5        col6        col7
0     2     1  2000     0   233  251.109065   18.858343
1     2     1  2000     3   233  251.109065   18.858343
2     2     2  2000     0    96  300.141548   67.874851
3     2     2  5000     3    96  412.916402 -127.405555
4     2     3  2000     0    53  291.836682   58.692756
5     2     3  5000     3    53  394.260558 -148.001670
6     2     4  2000     0    29  327.351227   95.252774
7     2     4  5000     3    29  478.924355  -63.949017
8     2     5  2000     0    24  371.598847  136.983014
9     2     5  5000     3    24  574.811102   26.888185

现在基于col1和col2的值的组合,我想将col3拆分为两个单独的列,其值来自col4。并且基于此col6和col7也需要分别拆分为两个单独的列。所以我的结果数据框必须是这样的:

df2 = pd.DataFrame([[2, 2, 2, 2, 2], [1, 2, 3, 4, 5], [2000, 2000, 2000, 2000, 2000], [2000, 5000, 5000, 5000, 5000], [233, 96, 53, 29, 24], [251.109065, 300.141548, 291.836682, 327.351227, 371.598847], [251.109065, 412.916402, 394.260558, 478.924355, 574.811102], [18.858343, 67.874851, 58.692756, 95.252774, 136.983014], [18.858343, -127.405555, -148.00167, -63.949017, 26.888185]]).T


df2.columns = ['col1', 'col2', 'col3_0', 'col3_3', 'col5', 'col6_0', 'col6_3', 'col7_0', 'col7_3']

df2

   col1  col2  col3_0  col3_3  col5      col6_0      col6_3      col7_0      col7_3
0     2     1    2000    2000   233  251.109065  251.109065   18.858343   18.858343
1     2     2    2000    5000    96  300.141548  412.916402   67.874851 -127.405555
2     2     3    2000    5000    53  291.836682  394.260558   58.692756 -148.001670
3     2     4    2000    5000    29  327.351227  478.924355   95.252774  -63.949017
4     2     5    2000    5000    24  371.598847  574.811102  136.983014   26.888185

请注意' 0' 0和' 3'是来自col4的值,它用作新列的后缀:col3_0,col3_3col6_0,col6_3,col7_0和col7_3。如果我能提供有关拆分的任何进一步信息,请告诉我。非常感谢任何帮助。

2 个答案:

答案 0 :(得分:0)

您可以使用简单合并来完成此操作

df1_0 = df1[df1.col4==0].drop('col4',axis=1)
df1_3 = df1[df1.col4==3].drop('col4',axis=1)

result = pandas.merge( df1_0, df1_3, on=['col1','col2'],suffixes=['_0','_3']  )
result = result[sorted(list(result))] # to get columns in the order you like

   col1  col2  col3_0  col3_3  col5      col6_0      col6_3      col7_0  \
0     2     1    2000    2000   233  251.109065  251.109065   18.858343   
1     2     2    2000    5000    96  300.141548  412.916402   67.874851   
2     2     3    2000    5000    53  291.836682  394.260558   58.692756   
3     2     4    2000    5000    29  327.351227  478.924355   95.252774   
4     2     5    2000    5000    24  371.598847  574.811102  136.983014   

       col7_3  
0   18.858343  
1 -127.405555  
2 -148.001670  
3  -63.949017  
4   26.888185 

答案 1 :(得分:0)

res = pd.merge(df1[df1.col4 == 0].drop('col4', axis=1), df1[df1.col4 == 3].drop('col4', axis=1), on=['col1', 'col2', 'col5'], suffixes=['_0', '_3'])

   col1  col2  col3_0  col5    col6_0    col7_0  col3_3    col6_3    col7_3
0     2     1    2000   233  251.1091   18.8583    2000  251.1091   18.8583
1     2     2    2000    96  300.1415   67.8749    5000  412.9164 -127.4056
2     2     3    2000    53  291.8367   58.6928    5000  394.2606 -148.0017
3     2     4    2000    29  327.3512   95.2528    5000  478.9244  -63.9490
4     2     5    2000    24  371.5988  136.9830    5000  574.8111   26.8882

# to sort columns
res.T.sort_index().T

   col1  col2  col3_0  col3_3  col5    col6_0    col6_3    col7_0    col7_3
0     2     1    2000    2000   233  251.1091  251.1091   18.8583   18.8583
1     2     2    2000    5000    96  300.1415  412.9164   67.8749 -127.4056
2     2     3    2000    5000    53  291.8367  394.2606   58.6928 -148.0017
3     2     4    2000    5000    29  327.3512  478.9244   95.2528  -63.9490
4     2     5    2000    5000    24  371.5988  574.8111  136.9830   26.8882