我有一个数据框sega_df
:
Character Month Code
Sonic 18-Jan P008924
Shadow 18-Jan P007869
Sonic 18-Feb P007811
Sonic 18-Feb P008639
Sonic 18-Mar P008242
Sonic 18-Mar P007823
Sonic 18-Mar P007823
Sonic 18-Mar P008380
Sonic 18-Apr P008637
Shadow 18-Apr P008266
在我想要的输出中,我想计算每个字符每月唯一代码的数量。这意味着,例如,对于Sonic
,对于3月份,我希望看到总共3个而不是4个(忽略P007823
如何出现两次,并且当月有三个代码三月为他)。我想要的输出是:
Jan 18 Feb 18 Mar 18 Apr 18
Character
Sonic 1.0 2.0 3.0 1.0
Shadow 1.0 0.0 0.0 1.0
我尝试了.count()
和.unstack()
,并考虑在最后使用.sum()
。到目前为止我的代码是:
sega_pivot = sega_df.groupby(['Character','Month']).count().unstack()
答案 0 :(得分:0)
选项1
drop_duplicates
然后使用您的方法:
df = df.drop_duplicates('Code').groupby(['Character', 'Month']).count().unstack().fillna(0)
df.columns = df.columns.droplevel()
Month 18-Apr 18-Feb 18-Jan 18-Mar
Character
Shadow 1.0 0.0 1.0 0.0
Sonic 1.0 2.0 1.0 3.0
选项2
pivot_table
aggfunc
和 unique
df.pivot_table(index='Character', columns='Month', values='Code', aggfunc=lambda x: len(x.unique()))
Month 18-Apr 18-Feb 18-Jan 18-Mar
Character
Shadow 1.0 0.0 1.0 0.0
Sonic 1.0 2.0 1.0 3.0
答案 1 :(得分:0)
来自crosstab
df=df.drop_duplicates(['Character','Code'])
pd.crosstab(df.Character,df.Month)
Out[166]:
Month 18-Apr 18-Feb 18-Jan 18-Mar
Character
Shadow 1 0 1 0
Sonic 1 2 1 3