计算熊猫中的元素

时间:2017-02-12 12:02:57

标签: python pandas data-analysis

假设我有像这样的Panda DataFrame

import pandas as pd


a=pd.Series([{'Country'='Italy','Name'='Augustina','Gender'='Female','Number'=1}])
b=pd.Series([{'Country'='Italy','Name'='Piero','Gender'='Male','Number'=2}])
c=pd.Series([{'Country'='Italy','Name'='Carla','Gender'='Female','Number'=3}])
d=pd.Series([{'Country'='Italy','Name'='Roma','Gender'='Female','Number'=4}])
e=pd.Series([{'Country'='Greece','Name'='Sophia','Gender'='Female','Number'=5}])
f=pd.Series([{'Country'='Greece','Name'='Zeus','Gender'='Male','Number'=6}])

df=pd.DataFrame([a,b,c,d,e,f])

然后,我用multiindex排序,比如

df.set_index(['Country','Gender'],inplace=True)

现在,我想知道如何计算来自意大利的人数,或者我在数据框中有多少希腊女性。

我试过

df['Italy'].count()

 df['Greece']['Female'].count()

。它们都不起作用,

由于

1 个答案:

答案 0 :(得分:8)

我认为您需要groupby汇总size

What is the difference between size and count in pandas?

a=pd.DataFrame([{'Country':'Italy','Name':'Augustina','Gender':'Female','Number':1}])
b=pd.DataFrame([{'Country':'Italy','Name':'Piero','Gender':'Male','Number':2}])
c=pd.DataFrame([{'Country':'Italy','Name':'Carla','Gender':'Female','Number':3}])
d=pd.DataFrame([{'Country':'Italy','Name':'Roma','Gender':'Female','Number':4}])
e=pd.DataFrame([{'Country':'Greece','Name':'Sophia','Gender':'Female','Number':5}])
f=pd.DataFrame([{'Country':'Greece','Name':'Zeus','Gender':'Male','Number':6}])
df=pd.concat([a,b,c,d,e,f], ignore_index=True)
print (df)
  Country  Gender       Name  Number
0   Italy  Female  Augustina       1
1   Italy    Male      Piero       2
2   Italy  Female      Carla       3
3   Italy  Female       Roma       4
4  Greece  Female     Sophia       5
5  Greece    Male       Zeus       6

df = df.groupby('Country').size()
print (df)
Country
Greece    2
Italy     4
dtype: int64
df = df.groupby(['Country', 'Gender']).size()
print (df)
Country  Gender
Greece   Female    1
         Male      1
Italy    Female    3
         Male      1
dtype: int64

如果只需要MultiIndex xsslicers选择的某些尺寸:

df.set_index(['Country','Gender'],inplace=True)
print (df)
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Male        Piero       2
        Female      Carla       3
        Female       Roma       4
Greece  Female     Sophia       5
        Male         Zeus       6
print (df.xs('Italy', level='Country'))
             Name  Number
Gender                   
Female  Augustina       1
Male        Piero       2
Female      Carla       3
Female       Roma       4

print (len(df.xs('Italy', level='Country').index))
4

print (df.xs(('Greece', 'Female'), level=('Country', 'Gender')))
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.xs(('Greece', 'Female'), level=('Country', 'Gender')).index))
1
#KeyError: 'MultiIndex Slicing requires
#the index to be fully lexsorted tuple len (2), lexsort depth (0)'        
df.sort_index(inplace=True)
idx = pd.IndexSlice
print (df.loc[idx['Italy', :],:])
                     Name  Number
Country Gender                   
Italy   Female  Augustina       1
        Female      Carla       3
        Female       Roma       4
        Male        Piero       2

print (len(df.loc[idx['Italy', :],:].index))
4

print (df.loc[idx['Greece', 'Female'],:])
                  Name  Number
Country Gender                
Greece  Female  Sophia       5

print (len(df.loc[idx['Greece', 'Female'],:].index))
1