从部分分类列获取value_counts

时间:2019-03-18 17:37:44

标签: python pandas

我正在尝试使用value_counts(v0.23.4)从Categorical列(特别是包含月份信息)中获取pandas。当所有类别都存在时,这可以正常工作:

import calendar
import random

import pandas as pd

random.seed(1)

month_names = calendar.month_name[1:]
month_names += month_names

df1 = pd.DataFrame({
    'Month': month_names,
    'Flag': [random.choice([True, False]) for _ in month_names]
})

df1['Month'] = pd.Categorical(
    df1['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df1.groupby('Month')['Flag'].value_counts())

按预期打印:

Month      Flag 
January    False    2
February   True     2
March      False    2
April      True     2
May        True     2
June       False    2
July       False    1
           True     1
August     False    1
           True     1
September  False    2
October    True     2
November   False    1
           True     1
December   False    2
Name: Flag, dtype: int64

但是,如果我们的'Month'列未包含所有可能的类别,则pandas会抛出ValueError。例如:

month_names = ['January', 'February', 'March']
month_names += month_names

df2 = pd.DataFrame({
    'Month': month_names,
    'Flag': [random.choice([True, False]) for _ in month_names]
})

df2['Month'] = pd.Categorical(
    df2['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df2.groupby('Month')['Flag'].value_counts())

提高:

ValueError: operands could not be broadcast together with shape (12,) (3,)

有什么方法可以从部分数据中获取正确的value_counts结果吗?理想情况下,这将保留所有类别,但即使没有开始也是如此。

1 个答案:

答案 0 :(得分:2)

如果只需要观察的类别,则可以使用guard let name = nameTextField.text, let email = emailTextField.text, let dob = dateTextField.text else { return } let parameters: [String: String] = ["name": name, "email": email, "dob": dob] 关键字:

observed

要获取分组依据中的所有值,可以使用一种解决方法,即使用print(df2.groupby('Month', observed=True)['Flag'].value_counts()) #Month Flag #January False 1 # True 1 #February True 2 #March False 2 #Name: Flag, dtype: int64 ,然后使用reindex使用所有类别。老实说,我不确定为什么原始的crosstab会给GroupBy带来错误(对于其他方法也能正常工作),但是在使用stack使{ {1}}列为MultiIndex:

value_counts()