在我们将pandas中的DataFrame()分组之后,我们可以调用这三种方法。它们之间有什么区别?
import pandas as pd
import numpy as np
df=pd.DataFrame({
'key1':['a','a','b','b','a'],
'key2':['one','two','one','two','one'],
'data1':np.random.randn(5),
'data2':np.random.randn(5)
})
grouped=df['data1'].groupby(df['key1'])
print(grouped.size(),grouped.nunique(),grouped.count())
答案是一样的。
答案 0 :(得分:1)
例如,您的df如下所示
df=pd.DataFrame({
'key1':['a','a','b','b','a'],
'data1':[1,1,np.nan,1,2]
})
grouped=df['data1'].groupby(df['key1'])
grouped.size()# return length of value included the NaN value
Out[413]:
key1
a 3
b 2
Name: data1, dtype: int64
grouped.count()# not include the NaN , it will ignore np.nan in b
Out[414]:
key1
a 3
b 1
Name: data1, dtype: int64
grouped.nunique() # only return the real unique value(exclude NaN) , in a it will be 1 , 2 so return 2 , at b it will be NaN and 1 so return 1
Out[415]:
key1
a 2
b 1
Name: data1, dtype: int64