计算高维交叉表中的百分比

时间:2017-10-18 23:37:03

标签: python pandas crosstab

我在3个变量(crosstabpositionoffer)上制作了group。如何计算1个变量offer的总和,而不是边距(即按列标准化)?

df = pd.crosstab(df.group, [df.position, df.offer], margins = True)

DF

pid offer   position    group
1   accept  left        group1
1   accept  left        group1
1   accept  right       group2
1   reject  right       group2
1   reject  right       group1
2   reject  right       group1
2   reject  left        group2
2   accept  left        group3
3   accept  right       group3
3   reject  right       group1
3   reject  right       group2

我目前的交叉表:

 position         left                 right          All
 offer          accept   reject    accept   reject        
 group1         2         0           0       3       5
 group2         0         1           1       2       4
 group3         1         0           1       0       2
 All            3         1           2       5       11

预期结果:

 position         left                 right
 offer          accept   reject    accept   reject       
 group1            1       0         0        1 
 group2            0       1         0.33     0.66  
 group3            1       0         1        0  

谢谢!

2 个答案:

答案 0 :(得分:1)

在列中添加另一个步骤,groupby第0级,并将c除以sum

c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)

position   left            right          
offer    accept reject    accept    reject
group                                     
group1      1.0    0.0  0.000000  1.000000
group2      0.0    1.0  0.333333  0.666667
group3      1.0    0.0  1.000000  0.000000

如果你和我一样完美主义者,你可能希望将整数作为整数,你可以这样做:

df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)

position   left            right          
offer    accept reject    accept    reject
group                                     
group1        1      0         0         1
group2        0      1  0.333333  0.666667
group3        1      0         1         0

答案 1 :(得分:0)

您可以使用

In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)

In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer    accept           reject
position   left     right   left     right
group
group1      1.0  0.000000    0.0  1.000000
group2      0.0  0.333333    1.0  0.666667
group3      1.0  1.000000    0.0  0.000000

您也可以从dfa获得pivot_table

df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']