有没有一种方法可以根据一组条件在数据框中添加空行?

时间:2020-10-28 14:38:07

标签: python pandas dataframe

我试图计算一个人每周执行一次特定任务的工作量,但是我只能每周检索一次累积数据。第一周,我会得到一张看起来像这样的桌子

第1周数据

week1 = pd.read_csv(Week1data.csv)
display(week1)

name ,task, date       , hours ,
Bob  , a  , 10/28/2020 , 8     ,
Bob  , b  , 10/23/2020 , 8     ,
Bob  , c  , 10/22/2020 , 8     ,
David, a  , 10/12/2020 , 8     ,
David, b  , 10/20/2020 , 8     ,
David, d  , 10/28/2020 , 8     ,
David, f  , 10/24/2020 , 8     ,
Allen, b  , 10/08/2020 , 8     ,
Allen, d  , 10/18/2020 , 8     ,
Kora , a  , 10/21/2020 , 8     ,
Kora , d  , 10/23/2020 , 8     ,
Ash  , a  , 10/02/2020 , 8     ,
Ash  , b  , 10/08/2020 , 8     ,
Ash  , e  , 10/13/2020 , 8     ,

第二周我会得到一张这样的桌子

第2周数据 :请注意,人们已经执行了新任务,并为以前执行的任务以及已经执行任务的新人积累了更多的时间

week2 = pd.read_csv(Week2data.csv)
display(week2)

name ,task, date       , hours ,
Bob  , a  , 11/01/2020 , 12    ,
Bob  , b  , 10/30/2020 , 15    ,
Bob  , c  , 10/30/2020 , 9     ,
Bob  , d  , 11/03/2020 , 5     ,
David, a  , 11/05/2020 , 10    ,
David, b  , 11/03/2020 , 9     ,
David, d  , 11/01/2020 , 15    ,
David, f  , 10/30/2020 , 18    ,
Becca, a  , 11/04/2020 , 8     ,
Becca, c  , 11/04/2020 , 3     ,
Allen, b  , 11/04/2020 , 14    ,
Allen, d  , 11/04/2020 , 10    ,
Kora , a  , 11/01/2020 , 12    ,
Kora , d  , 11/03/2020 , 9     ,
Ash  , a  , 11/02/2020 , 15    ,
Ash  , b  , 11/08/2020 , 18    ,
Ash  , e  , 11/03/2020 , 11    ,
Tim  , a  , 11/01/2020 , 8     ,
Tim  , b  , 11/03/2020 , 6     ,

我不能简单地得出两个数据帧之间的小时数列之间的差异。如果我仅减去Week2 ['hours']-Week1 ['hours'],则在几行之后,我将从“ David”执行任务“ a”的时间中减去“ Bob”执行任务“ a”的时间'。这是不正确的。

我想在第1周的数据中添加一个空行,其中有新员工,或有新员工执行了一项新任务,然后采取不同的行动。 在添加空行之后。

调整后的第1周表格应如下所示

name ,task, date       , hours ,
Bob  , a  , 10/28/2020 , 8     ,
Bob  , b  , 10/23/2020 , 8     ,
Bob  , c  , 10/22/2020 , 8     ,
Nan  , Nan, Nan        , 0     ,
David, a  , 10/12/2020 , 8     ,
David, b  , 10/20/2020 , 8     ,
David, d  , 10/28/2020 , 8     ,
David, f  , 10/24/2020 , 8     ,
Nan  , Nan, Nan        , 0     ,
Nan  , Nan, Nan        , 0     ,
Allen, b  , 10/08/2020 , 8     ,
Allen, d  , 10/18/2020 , 8     ,
Kora , a  , 10/21/2020 , 8     ,
Kora , d  , 10/23/2020 , 8     ,
Ash  , a  , 10/02/2020 , 8     ,
Ash  , b  , 10/08/2020 , 8     ,
Ash  , e  , 10/13/2020 , 8     ,
Nan  , Nan, Nan        , 0     ,
Nan  , Nan, Nan        , 0     ,

第1,2,3,n周间隔列将添加到其自己的数据框中

在我拍摄之后,第1周时间间隔栏看起来像这样,而第2,3,n周的内容类似。

Week 1 Interval = Week2['hours'] - Week1['hours']

week 1 interval,
4,
7,
1,
5,
2,
1,
7,
10,
8,
3,
6,
2,
4,
1,
7,
10,
3,
8,

2 个答案:

答案 0 :(得分:1)

以您的示例为例,希望对您有所帮助:

week1 = [("Bob", "a", "10/28/2020", 8),
    ("Bob", "b", "10/23/2020", 8),
    ("Bob", "c", "10/22/2020", 8),
    ("David", "a", "10/12/2020", 8),
    ("David", "b", "10/20/2020", 8),     
    ("David", "d", "10/28/2020", 8),     
    ("David", "f", "10/24/2020", 8),     
    ("Allen", "a", "10/08/2020", 8),    
    ("Allen", "d", "10/18/2020", 8),     
    ("Kora", "a", "10/21/2020", 8),     
    ("Kora", "d", "10/23/2020", 8),     
    ("Ash", "a", "10/02/2020", 8),    
    ("Ash", "b", "10/08/2020", 8),    
    ("Ash", "e", "10/13/2020", 8)]
    
    week1 = pd.DataFrame(week1, columns=["name" ,"task", "date", "hours"])
    week1["week"] = 1
    
    week2 = [("Bob", "a", "11/01/2020", 12),
    ("Bob", "b", "10/30/2020", 15),
    ("Bob", "c", "10/30/2020", 9),
    ("Bob", "d", "11/03/2020", 5),
    ("David", "a", "11/05/2020", 10),
    ("David", "b", "11/03/2020", 9),
    ("David", "d", "11/01/2020", 15),
    ("David", "f", "10/30/2020", 18),
    ("Becca", "a", "11/04/2020", 8),
    ("Becca", "c", "11/04/2020", 3),
    ("Allen", "b", "11/04/2020", 14),
    ("Allen", "d", "11/04/2020", 10),
    ("Kora" , "a", "11/01/2020", 12),
    ("Kora" , "d", "11/03/2020", 9),
    ("Ash"  , "a", "11/02/2020", 15),
    ("Ash"  , "b", "11/08/2020", 18),
    ("Ash"  , "e", "11/03/2020", 11),
    ("Tim"  , "a", "11/01/2020", 8),
    ("Tim"  , "b", "11/03/2020", 6)]
    
    week2 = pd.DataFrame(week2, columns=["name" ,"task", "date", "hours"])
    week2["week"] = 2
   
    df = pd.concat([week1, week2])
    df = df[["name", "task", "hours", "week"]]
    pd.pivot_table(df, index=["name", "task"], values='hours', columns='week', aggfunc='sum').fillna(0)

Merge output without fillna()

答案 1 :(得分:1)

要在第1周中创建空行,可以使用外部合并:

df = week1.merge(week2, how='outer', on=['name', 'task'], suffixes=['_w1', '_w2']).sort_values(['name', 'task'])

df.hours_w1.fillna(0, inplace=True)

现在,您每周为每个名称/任务完成的每个任务都有匹配的行。然后,您可以简单地计算出差异:

df['interval'] = df['hours_w2'] - df['hours_w1']

结果将如下所示:

     name  task       date_w1  hours_w1       date_w2  hours_w2  interval
7   Allen   b     10/08/2020        8.0   11/04/2020         14       6.0
8   Allen   d     10/18/2020        8.0   11/04/2020         10       2.0
11  Ash     a     10/02/2020        8.0   11/02/2020         15       7.0
12  Ash     b     10/08/2020        8.0   11/08/2020         18      10.0
13  Ash     e     10/13/2020        8.0   11/03/2020         11       3.0
15  Becca   a             NaN       0.0   11/04/2020          8       8.0
16  Becca   c             NaN       0.0   11/04/2020          3       3.0
0   Bob     a     10/28/2020        8.0   11/01/2020         12       4.0
1   Bob     b     10/23/2020        8.0   10/30/2020         15       7.0
2   Bob     c     10/22/2020        8.0   10/30/2020          9       1.0
14  Bob     d             NaN       0.0   11/03/2020          5       5.0
3   David   a     10/12/2020        8.0   11/05/2020         10       2.0
4   David   b     10/20/2020        8.0   11/03/2020          9       1.0
5   David   d     10/28/2020        8.0   11/01/2020         15       7.0
6   David   f     10/24/2020        8.0   10/30/2020         18      10.0
9   Kora    a     10/21/2020        8.0   11/01/2020         12       4.0
10  Kora    d     10/23/2020        8.0   11/03/2020          9       1.0
17  Tim     a             NaN       0.0   11/01/2020          8       8.0
18  Tim     b             NaN       0.0   11/03/2020          6       6.0