大熊猫在多个特定条件下的groupby计数

时间:2020-04-07 11:01:34

标签: pandas pandas-groupby

我有一个数据框,如下所示。

Doctor       Appointment           B_ID       No_Show   
  A          2020-01-18 12:00:00     1          0.2
  A          2020-01-18 12:30:00     2          0.3
  A          2020-01-18 13:00:00     3          0.8
  A          2020-01-18 13:00:00     4          0.3
  A          2020-01-18 13:30:00     5          0.6
  A          2020-01-18 14:00:00     6          0.8
  A          2020-01-18 14:00:00     7          0.9
  A          2020-01-18 14:00:00     8          0.4
  A          2020-01-18 14:00:00     9          0.6
  A          2020-01-19 12:00:00     12         0.9
  A          2020-01-19 12:00:00     13         0.5
  A          2020-01-19 13:00:00     14         0.3
  A          2020-01-19 13:00:00     15         0.7
  A          2020-01-19 14:00:00     16         0.6
  A          2020-01-19 14:00:00     17         0.8
  A          2020-01-19 14:00:00     19         0.3

从上面我想在df以下做准备。

No_Show =不出现的可能性。

从上面我想在数据框下面准备

预期输出:

Doctor  Appointment        B_ID   No_Show   Session  slot_num   Patient_count
  A    2020-01-18 12:00:00   1     0.2       S1      1          1
  A    2020-01-18 12:30:00   2     0.3       S1      2          1
  A    2020-01-18 13:00:00   3     0.8       S1      3          1
  A    2020-01-18 13:00:00   4     0.3       S1      3          2
  A    2020-01-18 13:30:00   5     0.6       S1      4          1
  A    2020-01-18 14:00:00   6     0.8       S1      5          1
  A    2020-01-18 14:00:00   7     0.9       S1      5          2
  A    2020-01-18 14:00:00   8     0.4       S1      5          3
  A    2020-01-18 14:00:00   9     0.6       S1      5          4
  A    2020-01-19 12:00:00   12    0.9       S2      1          1
  A    2020-01-19 12:00:00   13    0.5       S2      1          2
  A    2020-01-19 12:30:00   14    0.3       S2      2          1
  A    2020-01-19 13:00:00   15    0.7       S2      3          1
  A    2020-01-19 13:30:00   15    0.7       S2      4          1
  A    2020-01-19 14:00:00   16    0.6       S2      5          1
  A    2020-01-19 14:00:00   17    0.8       S2      5          2
  A    2020-01-19 14:00:00   19    0.3       S2      5          3

说明:

会议=每天考虑一次会议。

slot_num =当天的广告位(假定每个广告位持续30分钟)。

Patient_count =在相同会话和相同时段上的患者人数。

1 个答案:

答案 0 :(得分:3)

对于SeriesSeries.factorize一起使用,加上S的前缀,并转换为Series和字符串,在GroupBy.transform中的自定义函数中使用相似的想法,对于{ {3}} id添加了新列slot_num

df['Appointment'] = pd.to_datetime(df['Appointment'])
dates = df['Appointment'].dt.date

df['Session'] = 'S' + pd.Series(dates.factorize()[0] + 1, index=df.index).astype(str)
f = lambda x: pd.factorize(x)[0]
df['slot_num'] = df.groupby(['Doctor', 'Session'])['Appointment'].transform(f) + 1
df['Patient_count'] = df.groupby(['Doctor', 'Session', 'slot_num']).cumcount() + 1
print (df)
   Doctor         Appointment  B_ID  No_Show Session  slot_num  Patient_count
0       A 2020-01-18 12:00:00     1      0.2      S1         1              1
1       A 2020-01-18 12:30:00     2      0.3      S1         2              1
2       A 2020-01-18 13:00:00     3      0.8      S1         3              1
3       A 2020-01-18 13:00:00     4      0.3      S1         3              2
4       A 2020-01-18 13:30:00     5      0.6      S1         4              1
5       A 2020-01-18 14:00:00     6      0.8      S1         5              1
6       A 2020-01-18 14:00:00     7      0.9      S1         5              2
7       A 2020-01-18 14:00:00     8      0.4      S1         5              3
8       A 2020-01-18 14:00:00     9      0.6      S1         5              4
9       A 2020-01-19 12:00:00    12      0.9      S2         1              1
10      A 2020-01-19 12:30:00    13      0.5      S2         2              1
11      A 2020-01-19 13:00:00    14      0.3      S2         3              1
12      A 2020-01-19 13:30:00    15      0.7      S2         4              1
13      A 2020-01-19 14:00:00    16      0.6      S2         5              1
14      A 2020-01-19 14:00:00    17      0.8      S2         5              2
15      A 2020-01-19 14:00:00    19      0.3      S2         5              3