以下是可重现的示例:
import pandas as pd
df = pd.DataFrame([['Type A', 'Event1', 1, 2, 3], ['Type A', 'Event1', 4, 5, 6], ['Type A', 'Event1', 7, 8, 9],
['Type A', 'Event2', 10, 11, 12], ['Type A', 'Event2', 13, 14, 15], ['Type A', 'Event2', 16, 17, 18], \
['Type B', 'Event1', 19, 20, 21], ['Type B', 'Event1', 22, 23, 24], ['Type B', 'Event1', 25, 26, 27], \
['Type B', 'Event2', 28, 29, 30], ['Type B', 'Event2', 31, 32, 33], ['Type B', 'Event2', 34, 35, 36]])
df.columns = ['TypeName', 'EventNumber', 'PricePart1', 'PricePart2', 'PricePart3']
print(df)
礼物:
TypeName EventNumber PricePart1 PricePart2 PricePart3
0 Type A Event1 1 2 3
1 Type A Event1 4 5 6
2 Type A Event1 7 8 9
3 Type A Event2 10 11 12
4 Type A Event2 13 14 15
5 Type A Event2 16 17 18
6 Type B Event1 19 20 21
7 Type B Event1 22 23 24
8 Type B Event1 25 26 27
9 Type B Event2 28 29 30
10 Type B Event2 31 32 33
11 Type B Event2 34 35 36
这是我尝试过的:
df['Average'] = df[['PricePart1', 'PricePart2', 'PricePart3']].mean(axis = 1)
print(df)
TypeName EventNumber PricePart1 PricePart2 PricePart3 Average
0 Type A Event1 1 2 3 2.0
1 Type A Event1 4 5 6 5.0
2 Type A Event1 7 8 9 8.0
3 Type A Event2 10 11 12 11.0
4 Type A Event2 13 14 15 14.0
5 Type A Event2 16 17 18 17.0
6 Type B Event1 19 20 21 20.0
7 Type B Event1 22 23 24 23.0
8 Type B Event1 25 26 27 26.0
9 Type B Event2 28 29 30 29.0
10 Type B Event2 31 32 33 32.0
11 Type B Event2 34 35 36 35.0
现在我有了一个名为Average
的新列,我可以按照以下代码对TypeName
,EventNumber
列进行分组并找到第25位和第50位:
print(df.groupby(['TypeName', 'EventNumber'])['Average'].quantile([0.25, 0.50]).reset_index())
我所拥有的:
TypeName EventNumber level_2 Average
0 Type A Event1 0.25 3.5
1 Type A Event1 0.50 5.0
2 Type A Event2 0.25 12.5
3 Type A Event2 0.50 14.0
4 Type B Event1 0.25 21.5
5 Type B Event1 0.50 23.0
6 Type B Event2 0.25 30.5
7 Type B Event2 0.50 32.0
我希望将level_2
作为Average
列中的值的单独列,就像我创建的输出DataFrame一样:
df1 = pd.DataFrame([['Type A', 'Event1', 3.5, 5], ['Type A', 'Event2', 12.5, 14], ['Type B', 'Event1', 21.5, 23], ['Type B', 'Event2', 30.5, 32]])
df1.columns = ['TypeName', 'EventNumber', '0.25', '0.50']
print(df1)
我想要什么:
TypeName EventNumber 0.25 0.50
0 Type A Event1 3.5 5
1 Type A Event2 12.5 14
2 Type B Event1 21.5 23
3 Type B Event2 30.5 32
我非常确定这是重复的,但是我已经在StackOverflow上搜索了,但是由于措词上的困难(或者可能只是我很愚蠢)而找不到答案
答案 0 :(得分:3)
将unstack
与reset_index
一起使用:
df = (df.groupby(['TypeName', 'EventNumber'])['Average']
.quantile([0.25, 0.50])
.unstack()
.reset_index())
print (df)
TypeName EventNumber 0.25 0.5
0 Type A Event1 3.5 5.0
1 Type A Event2 12.5 14.0
2 Type B Event1 21.5 23.0
3 Type B Event2 30.5 32.0
语法糖解决方案-不需要新列Average
,可以将groupby
与3 Series
一起使用:
s = df[['PricePart1', 'PricePart2', 'PricePart3']].mean(axis = 1)
df = (s.groupby([df['TypeName'], df['EventNumber']])
.quantile([0.25, 0.50])
.unstack()
.reset_index())
print (df)
TypeName EventNumber 0.25 0.5
0 Type A Event1 3.5 5.0
1 Type A Event2 12.5 14.0
2 Type B Event1 21.5 23.0
3 Type B Event2 30.5 32.0