计算任务完成时间(以熊猫为单位)

时间:2020-05-11 19:29:03

标签: python pandas

我有这样的学生数据。

date        student_name    tasks   remarks

2012-12-01  sarita          -100    Complete 100 tasks
2013-12-04  manu            -35     complete 35 taks
2013-01-15  sarita           10     completed 10 tasks
2013-02-13  sarita          -25     Complete 25 more tasks
2013-03-13  sarita           30     completed 30 taks
2013-03-12  manu             10     completed 10 tasks 

如何计算complete ondate completed列?

最终结果应该是

date        student_name    tasks   remarks                Completed  Completion Date'

2012-12-01  sarita          -100    Complete 100 tasks     Yes        2013-04-12
2013-12-04  manu            -35     complete 35 taks       No        'Not Completed Yet'
2013-01-15  sarita          10      completed 10 tasks     NaN        
2013-02-13  sarita         -25      complete 25 more tasks No        'Not Completed Yet'
2013-03-13  sarita         30       completed 30 taks      NaN
2013-03-12  manu           10       completed 10 tasks     NaN
2013-04-12 sarita          70       completed 70 tasks     NaN
2013-05-16 sarita          8        completed 8 tasks      NaN 

我要计算CompletedCompletion Date列。 我是否应该为此创建单独的DF?

Completed应根据用户到目前为止已完成的积极任务数量进行计算。

截止日期sarita已完成118个任务 因此,每当我运行DF时,由于-100,在date上将Completed设置为yes并将Completion Date设置为2013-04-12任务已经完成。

date上,2013年2月13日student_name sarita Completed应设置为No,因为她仅完成了接下来的18个任务。一旦为下一个> 7个任务插入肯定条目,Completed应该设置为yes,而Completion Date应该相应地设置。

希望,这可以清除它。

1 个答案:

答案 0 :(得分:0)

您去了,这将完成您想要的事情:

import pandas as pd

#data
d = {'date':['2012-12-01', '2013-12-04', '2013-01-15', '2013-02-13', '2013-03-13', '2013-03-12', '2013-04-12', '2013-05-16'],
 'student_name':['sarita', 'manu', 'sarita', 'sarita', 'sarita', 'manu', 'sarita', 'sarita'],
 'tasks':[-100, -35, 10, -25, 30, 10, 70, 8],
 'remarks':['Complete 100 tasks', 'complete 35 taks', 'completed 10 tasks', 'Complete 25 more tasks', 'completed 30 taks', 'completed 10 tasks', 'completed 70 tasks', 'completed 8 tasks']}

#create dataframe
df = pd.DataFrame(data = d)
#convert string to date
df['date'] =  pd.to_datetime(df['date'], format='%Y/%m/%d')
#create new empty columns
df['Comleted'] = ''
df['Completion Date'] = ''

#get list of students
students = df['student_name'].unique().tolist()

#loop over stdents
for student in students:
    #get student record
    studentRecords = df.loc[df['student_name'] == student]
    #get assigned / completed tasks dfs
    assignedTasks = studentRecords.loc[~df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
    completedTasks = studentRecords.loc[df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
    #loop over assigned tasks
    for i, row in assignedTasks.iterrows():
        #get + tasks
        tasks = -assignedTasks.at[i, 'tasks']
        #get cumulative tasks sum
        completedTasks['cumsum'] = completedTasks['tasks'].cumsum()
        #flag where tasks have been completed
        completedTasks['finishedAssignemt'] = completedTasks['cumsum'].apply(lambda x: 1 if x >= tasks else 0)
        #if completed, dummy frame of needed info
        neededinfo = completedTasks[completedTasks.finishedAssignemt == 1].head(1)
        #if length is zero then tasks has not been completed
        if len(neededinfo) == 0: 
            #update records
            df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'No'
            df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Not Completed Yet'
        #if completed
        else:
            #get date of completion
            onDate = neededinfo.iloc[0]['date']
            #completed on the date
            tasksTillDate = neededinfo.iloc[0]['cumsum']
            #remove previous records
            completedTasks = completedTasks.loc[completedTasks['finishedAssignemt'] == 1].reset_index(drop=True)
            #update tasks with the new value (remving tasks that account for different assignment)
            completedTasks['tasks'].loc[(completedTasks['cumsum'] == tasksTillDate) & (completedTasks['date'] == onDate)] = tasksTillDate - tasks
            #update records
            df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Yes'
            df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = onDate

print(df)

date    student_name    tasks   remarks Comleted    Completion Date
2012-12-01  sarita  -100    Complete 100 tasks  Yes 2013-04-12 00:00:00
2013-12-04  manu    -35 complete 35 taks    No  Not Completed Yet
2013-01-15  sarita  10  completed 10 tasks      
2013-02-13  sarita  -25 Complete 25 more tasks  No  Not Completed Yet
2013-03-13  sarita  30  completed 30 taks       
2013-03-12  manu    10  completed 10 tasks      
2013-04-12  sarita  70  completed 70 tasks      
2013-05-16  sarita  8   completed 8 tasks