Question

我有一个df，其中包含20-25,25-30等形式的'age_bracket'和'no_show'，其值仅为0或1，用于指示患者是否出席预约或不

为了创建条形图，我需要显示每个年龄的show vs no show的总计和比例值。我试过这个：

noshow_counts = df.groupby('age_bracket')['no_Show'].value_counts()[1]
show_counts = df.groupby('age_bracket')['no_Show'].value_counts()[0]
age_totals = df.groupby('age_bracket').count()['no_Show']

像这样计算比例

nowshow_proportions = noshow_counts / age_totals
show_proportions = show_counts /age_totals

以下是条形图中的使用方法

#Bar Chart
ind = np.arange(len(nowshow_proportions))  
width = 0.40 
# plot bars
noshow_bars = plt.bar(ind, nowshow_proportions, width, color='g', 
alpha=.7, label='No Show')
show_bar = plt.bar(ind + width, show_proportions, width, color='b', 
alpha=.7, label='Show')

这不会产生正确的值。我猜这是因为value_counts返回一个对象而不是一个系列。所以这是不正确的

 noshow_counts = df.groupby('age_bracket')['no_Show'].value_counts()[1]
 show_counts = df.groupby('age_bracket')['no_Show'].value_counts()[0]

有没有办法只选择'1'而只选择'0'值并返回一个系列？

Answer 1

我能够用

获得正确的结果

show_counts = df[df['no_Show'] == 0].groupby('age_bracket').count(). 
['no_Show']
noshow_counts=df[df['no_Show'] == 1].groupby('age_bracket').count(). 
['no_Show']

Python / Pandas：选择特定值和返回系列

1 个答案: