Question

我有一个带有员工PTO条目的pandas框架：

int[] result = new int[test[0].length]; //Array to store the result
for(int j=0; j<test[0].length; j++) { //loop through columns
    int max = 0; //int to store the max of the i column
    for(int i=0; i<test.length; i++) { //loop through lines
        if(test[i][j] > max) { //if the number at the column i and line j is bigger than max
            max = test[i][j]; then max becomes this number
        }
    }
    result[i] = max; //Add the found max to the result array
}
System.out.println(Arrays.toString(result)); //print the result

我试图找出每个员工连续几天的PTO持续时间。例如，员工＃1最长的PTO持续时间为3天（2017年1月1日至1月3日），员工＃2的最长PTO持续时间为4天（9/5/2017至2017年9月8日）。

我对我应该用来回答这个问题的大熊猫employee_id time_off_date 1 1/1/2017 1 1/2/2017 1 1/3/2017 1 5/1/2017 2 6/1/2017 2 9/5/2017 2 9/6/2017 2 9/7/2017 2 9/8/2017和groupby的组合感到有点难过。

奖励积分：查找X天内所有员工的所有PTO持续时间。

如果您有任何疑问，请与我们联系。

Answer 1

在每个employee_id中使用diff和cumsum创建子广告，然后我们groupby size与max一起找到最大值

s=df.groupby('employee_id').time_off_date.apply(lambda x : x.diff().dt.days.ne(1).cumsum())
df['New']=s
df.groupby(['employee_id','New']).size().max(level=0)
Out[423]: 
employee_id
1    3
2    4
dtype: int64

熊猫 - groupby＆＃34;最长的＆＃34;付费休息时间（PTO）

1 个答案: