熊猫根据值插入行并填充0

时间:2020-05-20 15:31:35

标签: python pandas nan fill reindex

我有以下数据框,具有以下值。 我想插入行,以便每个人(Toby,Jane,David)以及2020年的每个月都有一行。 如果x或y没有值,则填充0。

    ID  Name    Date        x   y
0   001 Toby    2020-01-01  15  NaN
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-05-01  7   1
3   001 Toby    2020-07-01  NaN 1
4   002 Jane    2020-11-01  20  1
5   002 Jane    2020-12-01  21  10
6   003 David   2020-07-01  -3  2

结果数据框应有36行,每个人12条。

ID  Name        Date        x   y
0   001 Toby    2020-01-01  15  0
1   001 Toby    2020-02-01  12  7
2   001 Toby    2020-03-01  0   0
3   001 Toby    2020-04-01  0   0
4   001 Toby    2020-05-01  7   1
5   001 Toby    2020-06-01  0   0
6   001 Toby    2020-07-01  0   1
7   001 Toby    2020-08-01  0   0
8   001 Toby    2020-09-01  0   0
9   001 Toby    2020-10-01  0   0
10  001 Toby    2020-11-01  0   0
11  001 Toby    2020-12-01  0   0
12  002 Jane    2020-01-01  0   0
13  002 Jane    2020-02-01  0   0
14  002 Jane    2020-03-01  0   0
15  002 Jane    2020-04-01  0   0
16  002 Jane    2020-05-01  0   0
17  002 Jane    2020-06-01  0   0
18  002 Jane    2020-07-01  0   0
19  002 Jane    2020-08-01  0   0
20  002 Jane    2020-09-01  0   0
21  002 Jane    2020-10-01  0   0
22  002 Jane    2020-11-01  20  1
23  002 Jane    2020-12-01  21  10
24  003 David   2020-01-01  0   0
25  003 David   2020-02-01  0   0
26  003 David   2020-03-01  0   0
27  003 David   2020-04-01  0   0
28  003 David   2020-05-01  0   0
29  003 David   2020-06-01  0   0
30  003 David   2020-07-01  -3  2
31  003 David   2020-08-01  0   0
32  003 David   2020-09-01  0   0
33  003 David   2020-10-01  0   0
34  003 David   2020-11-01  0   0
35  003 David   2020-12-01  0   0

我研究了reindex,并设法使它适用于单个系列。但是我还没有找到在数据框上动态生成行然后填充缺失值的方法。

任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:4)

您可以将reindex用于此目的:

# list of the desired dates
# make sure that it has the same type with `Date` in your data
# here I assume strings
dates = pd.Series([f'2020-{x}-01' for x in range(1,13)]), name='Date')

(df.set_index(['Date']).groupby(['ID','Name'])
   .apply(lambda x: x.drop(['ID', 'Name'],axis=1).reindex(dates).fillna(0))
   .reset_index()
)

答案 1 :(得分:1)

另一种方法是将日期和您的姓名与原始数据框合并的笛卡尔积。

dates = pd.date_range(start='01-01-2020',end='12-01-2020',freq='MS')
dates = pd.DataFrame(dates,columns=['Date']).assign(key='key')

names = df[['Name','ID']].drop_duplicates()

df1 = pd.merge(names.assign(key='key'),dates,on='key',how='outer').drop('key',axis=1)

df2 = pd.merge(df,df1,how='right',on=['Date','Name','ID']).fillna(0)\
                                          .sort_values(['ID','Date'])

print(df2)

    ID   Name       Date     x     y
0    1   Toby 2020-01-01  15.0   0.0
1    1   Toby 2020-02-01  12.0   7.0
7    1   Toby 2020-03-01   0.0   0.0
8    1   Toby 2020-04-01   0.0   0.0
2    1   Toby 2020-05-01   7.0   1.0
9    1   Toby 2020-06-01   0.0   0.0
3    1   Toby 2020-07-01   0.0   1.0
10   1   Toby 2020-08-01   0.0   0.0
11   1   Toby 2020-09-01   0.0   0.0
12   1   Toby 2020-10-01   0.0   0.0
13   1   Toby 2020-11-01   0.0   0.0
14   1   Toby 2020-12-01   0.0   0.0
15   2   Jane 2020-01-01   0.0   0.0
16   2   Jane 2020-02-01   0.0   0.0
17   2   Jane 2020-03-01   0.0   0.0
18   2   Jane 2020-04-01   0.0   0.0
19   2   Jane 2020-05-01   0.0   0.0
20   2   Jane 2020-06-01   0.0   0.0
21   2   Jane 2020-07-01   0.0   0.0
22   2   Jane 2020-08-01   0.0   0.0
23   2   Jane 2020-09-01   0.0   0.0
24   2   Jane 2020-10-01   0.0   0.0
4    2   Jane 2020-11-01  20.0   1.0
5    2   Jane 2020-12-01  21.0  10.0
25   3  David 2020-01-01   0.0   0.0
26   3  David 2020-02-01   0.0   0.0
27   3  David 2020-03-01   0.0   0.0
28   3  David 2020-04-01   0.0   0.0
29   3  David 2020-05-01   0.0   0.0
30   3  David 2020-06-01   0.0   0.0
6    3  David 2020-07-01  -3.0   2.0
31   3  David 2020-08-01   0.0   0.0
32   3  David 2020-09-01   0.0   0.0
33   3  David 2020-10-01   0.0   0.0
34   3  David 2020-11-01   0.0   0.0
35   3  David 2020-12-01   0.0   0.0