我试图计算一个人每周执行一次特定任务的工作量,但是我只能每周检索一次累积数据。第一周,我会得到一张看起来像这样的桌子
第1周数据
week1 = pd.read_csv(Week1data.csv)
display(week1)
name ,task, date , hours ,
Bob , a , 10/28/2020 , 8 ,
Bob , b , 10/23/2020 , 8 ,
Bob , c , 10/22/2020 , 8 ,
David, a , 10/12/2020 , 8 ,
David, b , 10/20/2020 , 8 ,
David, d , 10/28/2020 , 8 ,
David, f , 10/24/2020 , 8 ,
Allen, b , 10/08/2020 , 8 ,
Allen, d , 10/18/2020 , 8 ,
Kora , a , 10/21/2020 , 8 ,
Kora , d , 10/23/2020 , 8 ,
Ash , a , 10/02/2020 , 8 ,
Ash , b , 10/08/2020 , 8 ,
Ash , e , 10/13/2020 , 8 ,
第二周我会得到一张这样的桌子
第2周数据 :请注意,人们已经执行了新任务,并为以前执行的任务以及已经执行任务的新人积累了更多的时间
week2 = pd.read_csv(Week2data.csv)
display(week2)
name ,task, date , hours ,
Bob , a , 11/01/2020 , 12 ,
Bob , b , 10/30/2020 , 15 ,
Bob , c , 10/30/2020 , 9 ,
Bob , d , 11/03/2020 , 5 ,
David, a , 11/05/2020 , 10 ,
David, b , 11/03/2020 , 9 ,
David, d , 11/01/2020 , 15 ,
David, f , 10/30/2020 , 18 ,
Becca, a , 11/04/2020 , 8 ,
Becca, c , 11/04/2020 , 3 ,
Allen, b , 11/04/2020 , 14 ,
Allen, d , 11/04/2020 , 10 ,
Kora , a , 11/01/2020 , 12 ,
Kora , d , 11/03/2020 , 9 ,
Ash , a , 11/02/2020 , 15 ,
Ash , b , 11/08/2020 , 18 ,
Ash , e , 11/03/2020 , 11 ,
Tim , a , 11/01/2020 , 8 ,
Tim , b , 11/03/2020 , 6 ,
我不能简单地得出两个数据帧之间的小时数列之间的差异。如果我仅减去Week2 ['hours']-Week1 ['hours'],则在几行之后,我将从“ David”执行任务“ a”的时间中减去“ Bob”执行任务“ a”的时间'。这是不正确的。
我想在第1周的数据中添加一个空行,其中有新员工,或有新员工执行了一项新任务,然后采取不同的行动。 在添加空行之后。
调整后的第1周表格应如下所示
name ,task, date , hours ,
Bob , a , 10/28/2020 , 8 ,
Bob , b , 10/23/2020 , 8 ,
Bob , c , 10/22/2020 , 8 ,
Nan , Nan, Nan , 0 ,
David, a , 10/12/2020 , 8 ,
David, b , 10/20/2020 , 8 ,
David, d , 10/28/2020 , 8 ,
David, f , 10/24/2020 , 8 ,
Nan , Nan, Nan , 0 ,
Nan , Nan, Nan , 0 ,
Allen, b , 10/08/2020 , 8 ,
Allen, d , 10/18/2020 , 8 ,
Kora , a , 10/21/2020 , 8 ,
Kora , d , 10/23/2020 , 8 ,
Ash , a , 10/02/2020 , 8 ,
Ash , b , 10/08/2020 , 8 ,
Ash , e , 10/13/2020 , 8 ,
Nan , Nan, Nan , 0 ,
Nan , Nan, Nan , 0 ,
第1,2,3,n周间隔列将添加到其自己的数据框中
在我拍摄之后,第1周时间间隔栏看起来像这样,而第2,3,n周的内容类似。
Week 1 Interval = Week2['hours'] - Week1['hours']
week 1 interval,
4,
7,
1,
5,
2,
1,
7,
10,
8,
3,
6,
2,
4,
1,
7,
10,
3,
8,
答案 0 :(得分:1)
以您的示例为例,希望对您有所帮助:
week1 = [("Bob", "a", "10/28/2020", 8),
("Bob", "b", "10/23/2020", 8),
("Bob", "c", "10/22/2020", 8),
("David", "a", "10/12/2020", 8),
("David", "b", "10/20/2020", 8),
("David", "d", "10/28/2020", 8),
("David", "f", "10/24/2020", 8),
("Allen", "a", "10/08/2020", 8),
("Allen", "d", "10/18/2020", 8),
("Kora", "a", "10/21/2020", 8),
("Kora", "d", "10/23/2020", 8),
("Ash", "a", "10/02/2020", 8),
("Ash", "b", "10/08/2020", 8),
("Ash", "e", "10/13/2020", 8)]
week1 = pd.DataFrame(week1, columns=["name" ,"task", "date", "hours"])
week1["week"] = 1
week2 = [("Bob", "a", "11/01/2020", 12),
("Bob", "b", "10/30/2020", 15),
("Bob", "c", "10/30/2020", 9),
("Bob", "d", "11/03/2020", 5),
("David", "a", "11/05/2020", 10),
("David", "b", "11/03/2020", 9),
("David", "d", "11/01/2020", 15),
("David", "f", "10/30/2020", 18),
("Becca", "a", "11/04/2020", 8),
("Becca", "c", "11/04/2020", 3),
("Allen", "b", "11/04/2020", 14),
("Allen", "d", "11/04/2020", 10),
("Kora" , "a", "11/01/2020", 12),
("Kora" , "d", "11/03/2020", 9),
("Ash" , "a", "11/02/2020", 15),
("Ash" , "b", "11/08/2020", 18),
("Ash" , "e", "11/03/2020", 11),
("Tim" , "a", "11/01/2020", 8),
("Tim" , "b", "11/03/2020", 6)]
week2 = pd.DataFrame(week2, columns=["name" ,"task", "date", "hours"])
week2["week"] = 2
df = pd.concat([week1, week2])
df = df[["name", "task", "hours", "week"]]
pd.pivot_table(df, index=["name", "task"], values='hours', columns='week', aggfunc='sum').fillna(0)
答案 1 :(得分:1)
要在第1周中创建空行,可以使用外部合并:
df = week1.merge(week2, how='outer', on=['name', 'task'], suffixes=['_w1', '_w2']).sort_values(['name', 'task'])
df.hours_w1.fillna(0, inplace=True)
现在,您每周为每个名称/任务完成的每个任务都有匹配的行。然后,您可以简单地计算出差异:
df['interval'] = df['hours_w2'] - df['hours_w1']
结果将如下所示:
name task date_w1 hours_w1 date_w2 hours_w2 interval
7 Allen b 10/08/2020 8.0 11/04/2020 14 6.0
8 Allen d 10/18/2020 8.0 11/04/2020 10 2.0
11 Ash a 10/02/2020 8.0 11/02/2020 15 7.0
12 Ash b 10/08/2020 8.0 11/08/2020 18 10.0
13 Ash e 10/13/2020 8.0 11/03/2020 11 3.0
15 Becca a NaN 0.0 11/04/2020 8 8.0
16 Becca c NaN 0.0 11/04/2020 3 3.0
0 Bob a 10/28/2020 8.0 11/01/2020 12 4.0
1 Bob b 10/23/2020 8.0 10/30/2020 15 7.0
2 Bob c 10/22/2020 8.0 10/30/2020 9 1.0
14 Bob d NaN 0.0 11/03/2020 5 5.0
3 David a 10/12/2020 8.0 11/05/2020 10 2.0
4 David b 10/20/2020 8.0 11/03/2020 9 1.0
5 David d 10/28/2020 8.0 11/01/2020 15 7.0
6 David f 10/24/2020 8.0 10/30/2020 18 10.0
9 Kora a 10/21/2020 8.0 11/01/2020 12 4.0
10 Kora d 10/23/2020 8.0 11/03/2020 9 1.0
17 Tim a NaN 0.0 11/01/2020 8 8.0
18 Tim b NaN 0.0 11/03/2020 6 6.0