我有一个带有以下值的pandas数据框:
df1 = pd.DataFrame([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2], [1, 1, 2, 2, 3, 3, 4, 4, 5, 5], [2000, 2000, 2000, 5000, 2000, 5000, 2000, 5000, 2000, 5000], [0, 3, 0, 3, 0, 3, 0, 3, 0, 3], [233, 233, 96, 96, 53, 53, 29, 29, 24, 24], [251.109065, 251.109065, 300.141548, 412.916402, 291.836682, 394.260558, 327.351227, 478.924355, 371.598847, 574.811102], [18.858343, 18.858343, 67.874851, -127.405555, 58.692756, -148.001670, 95.252774, -63.949017, 136.983014, 26.888185]]).T
df1.columns = ['col1', 'col2', 'col3', 'col4', 'col5', 'col6', 'col7']
df1
col1 col2 col3 col4 col5 col6 col7
0 2 1 2000 0 233 251.109065 18.858343
1 2 1 2000 3 233 251.109065 18.858343
2 2 2 2000 0 96 300.141548 67.874851
3 2 2 5000 3 96 412.916402 -127.405555
4 2 3 2000 0 53 291.836682 58.692756
5 2 3 5000 3 53 394.260558 -148.001670
6 2 4 2000 0 29 327.351227 95.252774
7 2 4 5000 3 29 478.924355 -63.949017
8 2 5 2000 0 24 371.598847 136.983014
9 2 5 5000 3 24 574.811102 26.888185
现在基于col1和col2的值的组合,我想将col3拆分为两个单独的列,其值来自col4。并且基于此col6和col7也需要分别拆分为两个单独的列。所以我的结果数据框必须是这样的:
df2 = pd.DataFrame([[2, 2, 2, 2, 2], [1, 2, 3, 4, 5], [2000, 2000, 2000, 2000, 2000], [2000, 5000, 5000, 5000, 5000], [233, 96, 53, 29, 24], [251.109065, 300.141548, 291.836682, 327.351227, 371.598847], [251.109065, 412.916402, 394.260558, 478.924355, 574.811102], [18.858343, 67.874851, 58.692756, 95.252774, 136.983014], [18.858343, -127.405555, -148.00167, -63.949017, 26.888185]]).T
df2.columns = ['col1', 'col2', 'col3_0', 'col3_3', 'col5', 'col6_0', 'col6_3', 'col7_0', 'col7_3']
df2
col1 col2 col3_0 col3_3 col5 col6_0 col6_3 col7_0 col7_3
0 2 1 2000 2000 233 251.109065 251.109065 18.858343 18.858343
1 2 2 2000 5000 96 300.141548 412.916402 67.874851 -127.405555
2 2 3 2000 5000 53 291.836682 394.260558 58.692756 -148.001670
3 2 4 2000 5000 29 327.351227 478.924355 95.252774 -63.949017
4 2 5 2000 5000 24 371.598847 574.811102 136.983014 26.888185
请注意' 0' 0和' 3'是来自col4的值,它用作新列的后缀:col3_0,col3_3col6_0,col6_3,col7_0和col7_3。如果我能提供有关拆分的任何进一步信息,请告诉我。非常感谢任何帮助。
答案 0 :(得分:0)
您可以使用简单合并来完成此操作
df1_0 = df1[df1.col4==0].drop('col4',axis=1)
df1_3 = df1[df1.col4==3].drop('col4',axis=1)
result = pandas.merge( df1_0, df1_3, on=['col1','col2'],suffixes=['_0','_3'] )
result = result[sorted(list(result))] # to get columns in the order you like
col1 col2 col3_0 col3_3 col5 col6_0 col6_3 col7_0 \
0 2 1 2000 2000 233 251.109065 251.109065 18.858343
1 2 2 2000 5000 96 300.141548 412.916402 67.874851
2 2 3 2000 5000 53 291.836682 394.260558 58.692756
3 2 4 2000 5000 29 327.351227 478.924355 95.252774
4 2 5 2000 5000 24 371.598847 574.811102 136.983014
col7_3
0 18.858343
1 -127.405555
2 -148.001670
3 -63.949017
4 26.888185
答案 1 :(得分:0)
res = pd.merge(df1[df1.col4 == 0].drop('col4', axis=1), df1[df1.col4 == 3].drop('col4', axis=1), on=['col1', 'col2', 'col5'], suffixes=['_0', '_3'])
col1 col2 col3_0 col5 col6_0 col7_0 col3_3 col6_3 col7_3
0 2 1 2000 233 251.1091 18.8583 2000 251.1091 18.8583
1 2 2 2000 96 300.1415 67.8749 5000 412.9164 -127.4056
2 2 3 2000 53 291.8367 58.6928 5000 394.2606 -148.0017
3 2 4 2000 29 327.3512 95.2528 5000 478.9244 -63.9490
4 2 5 2000 24 371.5988 136.9830 5000 574.8111 26.8882
# to sort columns
res.T.sort_index().T
col1 col2 col3_0 col3_3 col5 col6_0 col6_3 col7_0 col7_3
0 2 1 2000 2000 233 251.1091 251.1091 18.8583 18.8583
1 2 2 2000 5000 96 300.1415 412.9164 67.8749 -127.4056
2 2 3 2000 5000 53 291.8367 394.2606 58.6928 -148.0017
3 2 4 2000 5000 29 327.3512 478.9244 95.2528 -63.9490
4 2 5 2000 5000 24 371.5988 574.8111 136.9830 26.8882