我正在尝试使用value_counts
(v0.23.4)从Categorical
列(特别是包含月份信息)中获取pandas
。当所有类别都存在时,这可以正常工作:
import calendar
import random
import pandas as pd
random.seed(1)
month_names = calendar.month_name[1:]
month_names += month_names
df1 = pd.DataFrame({
'Month': month_names,
'Flag': [random.choice([True, False]) for _ in month_names]
})
df1['Month'] = pd.Categorical(
df1['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df1.groupby('Month')['Flag'].value_counts())
按预期打印:
Month Flag
January False 2
February True 2
March False 2
April True 2
May True 2
June False 2
July False 1
True 1
August False 1
True 1
September False 2
October True 2
November False 1
True 1
December False 2
Name: Flag, dtype: int64
但是,如果我们的'Month'
列未包含所有可能的类别,则pandas
会抛出ValueError
。例如:
month_names = ['January', 'February', 'March']
month_names += month_names
df2 = pd.DataFrame({
'Month': month_names,
'Flag': [random.choice([True, False]) for _ in month_names]
})
df2['Month'] = pd.Categorical(
df2['Month'], categories=calendar.month_name[1:], ordered=True
)
print(df2.groupby('Month')['Flag'].value_counts())
提高:
ValueError: operands could not be broadcast together with shape (12,) (3,)
有什么方法可以从部分数据中获取正确的value_counts
结果吗?理想情况下,这将保留所有类别,但即使没有开始也是如此。
答案 0 :(得分:2)
如果只需要观察的类别,则可以使用guard let name = nameTextField.text,
let email = emailTextField.text,
let dob = dateTextField.text else {
return
}
let parameters: [String: String] = ["name": name, "email": email, "dob": dob]
关键字:
observed
要获取分组依据中的所有值,可以使用一种解决方法,即使用print(df2.groupby('Month', observed=True)['Flag'].value_counts())
#Month Flag
#January False 1
# True 1
#February True 2
#March False 2
#Name: Flag, dtype: int64
,然后使用reindex
使用所有类别。老实说,我不确定为什么原始的crosstab
会给GroupBy
带来错误(对于其他方法也能正常工作),但是在使用stack
使{ {1}}列为MultiIndex:
value_counts()