我得到了一些数据样本,例如
df = pd.DataFrame({'A':[1,1,3,3,4],
'B':['Very Happy','Sad','Sad','Happy','Happy'],
'C': [True,False,False,True,False]})
>> df
A B C
0 1 Very Happy True
1 1 Sad False
2 3 Sad False
3 3 Happy True
4 4 Happy False
并且想要计算每个组合的计数,因此crosstab
是要走的路
counts = pd.crosstab(index = df['A'], columns = [df['B'],df['C']])
>> counts
B Happy Sad Very Happy
C False True False True
A
1 0 0 1 1
3 0 1 1 0
4 1 0 0 0
然而,既不是'非常悲伤'。情感或身份2恰好出现在这个数据样本中,所以它不在交叉表中。我想把它作为
Very Happy Happy Sad Very Sad
True False True False True False True False
1 1 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0
我的解决方法是设置模板
emotions = ['Very Happy', 'Happy', 'Sad', 'Very Sad']
ids = [1,2,3,4]
truths = [True,False]
template = pd.DataFrame(index = pd.Index(ids),
columns= pd.MultiIndex.from_product((emotions,truths)))
>> template
Very Happy Happy Sad Very Sad
True False True False True False True False
1 NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN
然后填写
template.unstack()[counts.unstack().index] = counts.unstack()
template = template.fillna(0)
>> template
Very Happy Happy Sad Very Sad
True False True False True False True False
1 1 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0
问题在于,感觉必须有更清晰,更易读的方法来实现相同的结果。有什么想法吗?
答案 0 :(得分:2)
这是pivot_table
:
>>> pv = df.pivot_table(index='A',
... columns=['B', 'C'],
... aggfunc='size',
... fill_value=0)
>>> pv
B Happy Sad Very Happy
C False True False True
A
1 0 0 1 1
3 0 1 1 0
4 1 0 0 0
那里没有出现的列/行是因为它们的横截面不存在于框架中。您可以通过.reindex
:
>>> cols = pd.MultiIndex.from_product((['Very Happy', 'Happy', 'Sad', 'Very Sad'], [True, False]))
>>> pv.reindex(index=range(1, 5), columns=cols, fill_value=0)
Very Happy Happy Sad Very Sad
True False True False True False True False
A
1 1 0 0 0 0 1 0 0
2 0 0 0 0 0 0 0 0
3 0 0 1 0 0 1 0 0
4 0 0 0 1 0 0 0 0