我有一些想要在大熊猫中按性别使用电话设备的数据。
我需要计算这些值并使用这些计数创建一个新列。
数据df
如下所示:
Sex Apple Samsung Huawei Tecno
Male Yes Yes No No
Female Yes Yes No No
Female Yes Yes No No
Male Yes Yes No No
Male No Yes No No
Female No No No No
Female Yes Yes No No
Male Yes Yes No No
Male Yes Yes No No
Male Yes Yes No No
Female Yes Yes No No
Female Yes Yes No No
Female Yes Yes No No
Female Yes Yes No No
Female No Yes No No
Female Yes Yes No Yes
Male Yes Yes No No
这就是我想要的:
Sex Response Apple Samsung Huawei Tecno
Male Yes 6 7 0 0
No 1 0 7 7
Female Yes 8 9 0 1
No 2 1 10 9
我正在四处乱转,试图使它起作用,我的代码太乱了,我为发布它感到a愧。我至少从这个开始:
for name, group in df.groupby('Sex'):
print(name)
print(group)
我相信我可以通过groupby
和unstack
的某种组合来实现。另外,如果有人可以找到有关多层次数据分组的有价值的教程,我将不胜感激。
答案 0 :(得分:1)
使用:
df = (df.melt('Sex', value_name='Response')
.groupby(['Sex', 'Response', 'variable'])
.size()
.unstack(fill_value=0)
.rename_axis(None, axis=1))
print (df)
Apple Huawei Samsung Tecno
Sex Response
Female No 2 10 1 9
Yes 8 0 9 1
Male No 1 7 0 7
Yes 6 0 7 0
另一个类似的解决方案:
df = (df.melt('Sex', value_name='Response')
.groupby(['Sex', 'Response'])['variable']
.value_counts()
.unstack(fill_value=0)
.rename_axis(None, axis=1))
或者:
df1 = df.melt('Sex', value_name='Response')
df = pd.crosstab([df1['Sex'], df1['Response']], df1['variable']).rename_axis(None, axis=1)