pandas中count(),size(),unique()之间有什么区别?

时间:2018-04-27 16:06:41

标签: python pandas

在我们将pandas中的DataFrame()分组之后,我们可以调用这三种方法。它们之间有什么区别?

import pandas as pd
import numpy as np

df=pd.DataFrame({
    'key1':['a','a','b','b','a'],
    'key2':['one','two','one','two','one'],
    'data1':np.random.randn(5),
    'data2':np.random.randn(5)
})
grouped=df['data1'].groupby(df['key1'])
print(grouped.size(),grouped.nunique(),grouped.count())

答案是一样的。

1 个答案:

答案 0 :(得分:1)

例如,您的df如下所示

df=pd.DataFrame({
    'key1':['a','a','b','b','a'],
    'data1':[1,1,np.nan,1,2]
})
grouped=df['data1'].groupby(df['key1'])


grouped.size()# return length of value included the  NaN value

Out[413]:
key1
a    3
b    2
Name: data1, dtype: int64



grouped.count()# not include the NaN , it will ignore np.nan in b
Out[414]:
key1
a    3
b    1
Name: data1, dtype: int64

grouped.nunique() # only return the real unique value(exclude NaN) , in a it will be 1 , 2 so return 2 , at b it will be NaN and 1 so return 1
Out[415]: 
key1
a    2
b    1
Name: data1, dtype: int64