这是一个样本数据集
customer_number ethnicity fiscal_quarter fiscal_year
1 231 Black Quarter 1 2016
2 451 White Quarter 1 2016
3 345 White Quarter 1 2016
我想检查种族列的“亚洲”测试,按照financial_year,fiscal_quarter进行分组,并计算唯一的customer_number。但如果“亚洲”没有结果,请返回如下数据框。
customer_number fiscal_quarter fiscal_year
1 0 Quarter 1 2016
答案 0 :(得分:1)
简短回答
# make column `Categorical`, include `'Asian'` as one of the categories
e = df.ethnicity
df['ethnicity'] = pd.Categorical(e, categories=np.append('Asian', e.unique()))
# simple function to be applied. performs 2nd level of `groupby`
def f(df):
s = df.groupby('ethnicity').customer_number.nunique()
return s.loc['Asian']
# initial `groupby`
d = df.groupby(['fiscal_year', 'fiscal_quarter']).apply(f)
d.reset_index(name='nunique')
fiscal_year fiscal_quarter nunique
0 2016 Quarter 1 0
解释
groupby
方式生成groups
并且不存在的汇总结果的方法是将组列定义为'Categorical'
,您可以在其中定义类别包括缺少的东西。 pandas
将在汇总结果中包含该类别。groupby
有3个不同的列,并保持同样的便利。我不得不将分组分成2。
groupby
列不是'Categorical'
。即['fiscal_year', 'fiscal_quarter']
apply
到groupby
,这是一个仅对groupby
执行简单ethnicity
的函数。 将 维持所需的行为并报告所有类别,无论它们是否在数据中都有表示。保留所有类别
e = df.ethnicity
df['ethnicity'] = pd.Categorical(
e, categories=np.append(['Asian', 'Hispanic'], e.unique()))
def f(df):
return df.groupby('ethnicity').customer_number.nunique()
d = df.groupby(['fiscal_year', 'fiscal_quarter']).apply(f)
d.stack().reset_index(name='nunique')
fiscal_year fiscal_quarter ethnicity nunique
0 2016 Quarter 1 Asian 0
1 2016 Quarter 1 Hispanic 0
2 2016 Quarter 1 Black 1
3 2016 Quarter 1 White 1
答案 1 :(得分:0)
如果我理解了您正在寻找的内容,则应执行以下操作:
import pandas as pd
# Generate data
d = {'customer_number': [231, 451, 345, 236, 457, 354],
'ethnicity': ['Black', 'White', 'White', 'Black', 'White', 'White'],
'fiscal_quarter': ['Quarter 1','Quarter 1','Quarter 1','Quarter 3','Quarter 3','Quarter 1'],
'fiscal_year': [2016, 2016, 2016, 2015, 2015, 2017]}
df = pd.DataFrame(d)
# Helper function to determine subset of
# dataframe that meets ethnicity condition
def find_ethnicity(dff, ethnicity):
count = dff.customer_number[dff.ethnicity.eq(ethnicity)].nunique()
if count == 0:
dff = dff.head(1).copy()
else:
dff = dff[dff.ethnicity.eq(ethnicity)].copy().head(1)
dff['ethnicity'] = ethnicity
dff['customer_number'] = count
return dff
# Test with ethnicity 'Black' grouping by fiscal_year and fiscal_quarter
print(df.groupby(['fiscal_year', 'fiscal_quarter'], as_index=False).apply(find_ethnicity, 'Black')).reset_index(drop=True)
# customer_number ethnicity fiscal_quarter fiscal_year
# 0 1 Black Quarter 3 2015
# 1 1 Black Quarter 1 2016
# 2 0 Black Quarter 1 2017
# Test with ethnicity 'Asian' grouping by fiscal_year and fiscal_quarter
print(df.groupby(['fiscal_year', 'fiscal_quarter'], as_index=False).apply(find_ethnicity, 'Asian')).reset_index(drop=True)
# customer_number ethnicity fiscal_quarter fiscal_year
# 0 0 Asian Quarter 3 2015
# 1 0 Asian Quarter 1 2016
# 2 0 Asian Quarter 1 2017
# Test with ethnicity 'White' grouping by fiscal_year and fiscal_quarter
print(df.groupby(['fiscal_year', 'fiscal_quarter'], as_index=False).apply(find_ethnicity, 'White')).reset_index(drop=True)
# customer_number ethnicity fiscal_quarter fiscal_year
# 0 1 White Quarter 3 2015
# 1 2 White Quarter 1 2016
# 2 1 White Quarter 1 2017
# Test with ethnicity 'Latino' grouping by fiscal_year and fiscal_quarter
print(df.groupby(['fiscal_year', 'fiscal_quarter'], as_index=False).apply(find_ethnicity, 'Latino')).reset_index(drop=True)
# customer_number ethnicity fiscal_quarter fiscal_year
# 0 0 Latino Quarter 3 2015
# 1 0 Latino Quarter 1 2016
# 2 0 Latino Quarter 1 2017
# Test with ethnicity 'Asian' without grouping
print(find_ethnicity(df, 'Asian'))
# customer_number ethnicity fiscal_quarter fiscal_year
# 0 0 Asian Quarter 1 2016
我希望这证明有用。