我在3个变量(crosstab
,position
,offer
)上制作了group
。如何计算1个变量offer
的总和,而不是边距(即按列标准化)?
df = pd.crosstab(df.group, [df.position, df.offer], margins = True)
DF
pid offer position group
1 accept left group1
1 accept left group1
1 accept right group2
1 reject right group2
1 reject right group1
2 reject right group1
2 reject left group2
2 accept left group3
3 accept right group3
3 reject right group1
3 reject right group2
我目前的交叉表:
position left right All
offer accept reject accept reject
group1 2 0 0 3 5
group2 0 1 1 2 4
group3 1 0 1 0 2
All 3 1 2 5 11
预期结果:
position left right
offer accept reject accept reject
group1 1 0 0 1
group2 0 1 0.33 0.66
group3 1 0 1 0
谢谢!
答案 0 :(得分:1)
在列中添加另一个步骤,groupby
第0级,并将c
除以sum
。
c = pd.crosstab(df.group, [df.position, df.offer])
df = c / c.groupby(level=0, axis=1).sum()
print(df)
position left right
offer accept reject accept reject
group
group1 1.0 0.0 0.000000 1.000000
group2 0.0 1.0 0.333333 0.666667
group3 1.0 0.0 1.000000 0.000000
如果你和我一样完美主义者,你可能希望将整数作为整数,你可以这样做:
df = c.div(c.groupby(level=0, axis=1).sum()).astype(object)
print(df)
position left right
offer accept reject accept reject
group
group1 1 0 0 1
group2 0 1 0.333333 0.666667
group3 1 0 1 0
答案 1 :(得分:0)
您可以使用
In [4013]: dfa = df.groupby(['group', 'position', 'offer']).size().unstack(fill_value=0)
In [4014]: dfa.div(dfa.sum(axis=1), axis=0).unstack()
Out[4014]:
offer accept reject
position left right left right
group
group1 1.0 0.000000 0.0 1.000000
group2 0.0 0.333333 1.0 0.666667
group3 1.0 1.000000 0.0 0.000000
您也可以从dfa
获得pivot_table
。
df.pivot_table(index=['group', 'position'], columns='offer', aggfunc=len)['pid']