Question

您好我有以下表格

id       | start_date | end_date   | state
52183371 | 2015-03-31 | 2015-03-31 | working
52183371 | 2015-04-01 | 2015-04-31 | working
52183371 | 2015-04-02 | 2015-04-28 | working
52183371 | 2015-04-21 | 2015-04-30 | not_working

在此表中，我想计算开始日期当前行大于所有私有行的end_date的工作状态数

我希望看到的结果如下：

id       | start_date | end_date   | state      | working_count
52183371 | 2015-03-31 | 2015-03-31 | working    | NaN
52183371 | 2015-04-01 | 2015-04-31 | working    | 1
52183371 | 2015-04-02 | 2015-04-28 | working    | 1
52183371 | 2015-04-21 | 2015-04-30 | not_working| 1

在最后一行中，因为start_date低于之前的end_date所以我不想计算它。

目前我正在考虑使用循环，我使用start_date然后遍历这些唯一的开始日期，然后使用这些开始日期来过滤数据然后进行计算。但是，有没有熊猫这样做的方法呢？

Answer 1

如果我理解你的问题，你想检查所有前面行的end_date。我认为一种方法是使用max将列end_date上的cummax提升到当前行。所以如果你这样做：

(df.start_date > df.end_date.cummax().shift()).cumsum()

将start_date与end_date的最大值进行比较，直至前一行，从而获得预期的输出。

在我必须匹配前一行的条件的组之后的条件总和

1 个答案: