我有这样的学生数据。
date student_name tasks remarks
2012-12-01 sarita -100 Complete 100 tasks
2013-12-04 manu -35 complete 35 taks
2013-01-15 sarita 10 completed 10 tasks
2013-02-13 sarita -25 Complete 25 more tasks
2013-03-13 sarita 30 completed 30 taks
2013-03-12 manu 10 completed 10 tasks
如何计算complete on
,date completed
列?
最终结果应该是
date student_name tasks remarks Completed Completion Date'
2012-12-01 sarita -100 Complete 100 tasks Yes 2013-04-12
2013-12-04 manu -35 complete 35 taks No 'Not Completed Yet'
2013-01-15 sarita 10 completed 10 tasks NaN
2013-02-13 sarita -25 complete 25 more tasks No 'Not Completed Yet'
2013-03-13 sarita 30 completed 30 taks NaN
2013-03-12 manu 10 completed 10 tasks NaN
2013-04-12 sarita 70 completed 70 tasks NaN
2013-05-16 sarita 8 completed 8 tasks NaN
我要计算Completed
和Completion Date
列。
我是否应该为此创建单独的DF?
Completed
应根据用户到目前为止已完成的积极任务数量进行计算。
截止日期sarita
已完成118个任务
因此,每当我运行DF时,由于-100,在date
上将Completed
设置为yes
并将Completion Date
设置为2013-04-12任务已经完成。
在date
上,2013年2月13日student_name
sarita
Completed
应设置为No
,因为她仅完成了接下来的18个任务。一旦为下一个> 7个任务插入肯定条目,Completed
应该设置为yes
,而Completion Date
应该相应地设置。
希望,这可以清除它。
答案 0 :(得分:0)
您去了,这将完成您想要的事情:
import pandas as pd
#data
d = {'date':['2012-12-01', '2013-12-04', '2013-01-15', '2013-02-13', '2013-03-13', '2013-03-12', '2013-04-12', '2013-05-16'],
'student_name':['sarita', 'manu', 'sarita', 'sarita', 'sarita', 'manu', 'sarita', 'sarita'],
'tasks':[-100, -35, 10, -25, 30, 10, 70, 8],
'remarks':['Complete 100 tasks', 'complete 35 taks', 'completed 10 tasks', 'Complete 25 more tasks', 'completed 30 taks', 'completed 10 tasks', 'completed 70 tasks', 'completed 8 tasks']}
#create dataframe
df = pd.DataFrame(data = d)
#convert string to date
df['date'] = pd.to_datetime(df['date'], format='%Y/%m/%d')
#create new empty columns
df['Comleted'] = ''
df['Completion Date'] = ''
#get list of students
students = df['student_name'].unique().tolist()
#loop over stdents
for student in students:
#get student record
studentRecords = df.loc[df['student_name'] == student]
#get assigned / completed tasks dfs
assignedTasks = studentRecords.loc[~df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
completedTasks = studentRecords.loc[df['remarks'].str.contains('completed')].reset_index(drop=True).sort_values(by=['date'])
#loop over assigned tasks
for i, row in assignedTasks.iterrows():
#get + tasks
tasks = -assignedTasks.at[i, 'tasks']
#get cumulative tasks sum
completedTasks['cumsum'] = completedTasks['tasks'].cumsum()
#flag where tasks have been completed
completedTasks['finishedAssignemt'] = completedTasks['cumsum'].apply(lambda x: 1 if x >= tasks else 0)
#if completed, dummy frame of needed info
neededinfo = completedTasks[completedTasks.finishedAssignemt == 1].head(1)
#if length is zero then tasks has not been completed
if len(neededinfo) == 0:
#update records
df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'No'
df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Not Completed Yet'
#if completed
else:
#get date of completion
onDate = neededinfo.iloc[0]['date']
#completed on the date
tasksTillDate = neededinfo.iloc[0]['cumsum']
#remove previous records
completedTasks = completedTasks.loc[completedTasks['finishedAssignemt'] == 1].reset_index(drop=True)
#update tasks with the new value (remving tasks that account for different assignment)
completedTasks['tasks'].loc[(completedTasks['cumsum'] == tasksTillDate) & (completedTasks['date'] == onDate)] = tasksTillDate - tasks
#update records
df['Comleted'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = 'Yes'
df['Completion Date'].loc[(df['student_name'] == student) & (df['tasks'] == -tasks)] = onDate
print(df)
date student_name tasks remarks Comleted Completion Date
2012-12-01 sarita -100 Complete 100 tasks Yes 2013-04-12 00:00:00
2013-12-04 manu -35 complete 35 taks No Not Completed Yet
2013-01-15 sarita 10 completed 10 tasks
2013-02-13 sarita -25 Complete 25 more tasks No Not Completed Yet
2013-03-13 sarita 30 completed 30 taks
2013-03-12 manu 10 completed 10 tasks
2013-04-12 sarita 70 completed 70 tasks
2013-05-16 sarita 8 completed 8 tasks