我有以下数据框,具有以下值。 我想插入行,以便每个人(Toby,Jane,David)以及2020年的每个月都有一行。 如果x或y没有值,则填充0。
ID Name Date x y
0 001 Toby 2020-01-01 15 NaN
1 001 Toby 2020-02-01 12 7
2 001 Toby 2020-05-01 7 1
3 001 Toby 2020-07-01 NaN 1
4 002 Jane 2020-11-01 20 1
5 002 Jane 2020-12-01 21 10
6 003 David 2020-07-01 -3 2
结果数据框应有36行,每个人12条。
ID Name Date x y
0 001 Toby 2020-01-01 15 0
1 001 Toby 2020-02-01 12 7
2 001 Toby 2020-03-01 0 0
3 001 Toby 2020-04-01 0 0
4 001 Toby 2020-05-01 7 1
5 001 Toby 2020-06-01 0 0
6 001 Toby 2020-07-01 0 1
7 001 Toby 2020-08-01 0 0
8 001 Toby 2020-09-01 0 0
9 001 Toby 2020-10-01 0 0
10 001 Toby 2020-11-01 0 0
11 001 Toby 2020-12-01 0 0
12 002 Jane 2020-01-01 0 0
13 002 Jane 2020-02-01 0 0
14 002 Jane 2020-03-01 0 0
15 002 Jane 2020-04-01 0 0
16 002 Jane 2020-05-01 0 0
17 002 Jane 2020-06-01 0 0
18 002 Jane 2020-07-01 0 0
19 002 Jane 2020-08-01 0 0
20 002 Jane 2020-09-01 0 0
21 002 Jane 2020-10-01 0 0
22 002 Jane 2020-11-01 20 1
23 002 Jane 2020-12-01 21 10
24 003 David 2020-01-01 0 0
25 003 David 2020-02-01 0 0
26 003 David 2020-03-01 0 0
27 003 David 2020-04-01 0 0
28 003 David 2020-05-01 0 0
29 003 David 2020-06-01 0 0
30 003 David 2020-07-01 -3 2
31 003 David 2020-08-01 0 0
32 003 David 2020-09-01 0 0
33 003 David 2020-10-01 0 0
34 003 David 2020-11-01 0 0
35 003 David 2020-12-01 0 0
我研究了reindex
,并设法使它适用于单个系列。但是我还没有找到在数据框上动态生成行然后填充缺失值的方法。
任何帮助将不胜感激。
答案 0 :(得分:4)
您可以将reindex
用于此目的:
# list of the desired dates
# make sure that it has the same type with `Date` in your data
# here I assume strings
dates = pd.Series([f'2020-{x}-01' for x in range(1,13)]), name='Date')
(df.set_index(['Date']).groupby(['ID','Name'])
.apply(lambda x: x.drop(['ID', 'Name'],axis=1).reindex(dates).fillna(0))
.reset_index()
)
答案 1 :(得分:1)
另一种方法是将日期和您的姓名与原始数据框合并的笛卡尔积。
dates = pd.date_range(start='01-01-2020',end='12-01-2020',freq='MS')
dates = pd.DataFrame(dates,columns=['Date']).assign(key='key')
names = df[['Name','ID']].drop_duplicates()
df1 = pd.merge(names.assign(key='key'),dates,on='key',how='outer').drop('key',axis=1)
df2 = pd.merge(df,df1,how='right',on=['Date','Name','ID']).fillna(0)\
.sort_values(['ID','Date'])
print(df2)
ID Name Date x y
0 1 Toby 2020-01-01 15.0 0.0
1 1 Toby 2020-02-01 12.0 7.0
7 1 Toby 2020-03-01 0.0 0.0
8 1 Toby 2020-04-01 0.0 0.0
2 1 Toby 2020-05-01 7.0 1.0
9 1 Toby 2020-06-01 0.0 0.0
3 1 Toby 2020-07-01 0.0 1.0
10 1 Toby 2020-08-01 0.0 0.0
11 1 Toby 2020-09-01 0.0 0.0
12 1 Toby 2020-10-01 0.0 0.0
13 1 Toby 2020-11-01 0.0 0.0
14 1 Toby 2020-12-01 0.0 0.0
15 2 Jane 2020-01-01 0.0 0.0
16 2 Jane 2020-02-01 0.0 0.0
17 2 Jane 2020-03-01 0.0 0.0
18 2 Jane 2020-04-01 0.0 0.0
19 2 Jane 2020-05-01 0.0 0.0
20 2 Jane 2020-06-01 0.0 0.0
21 2 Jane 2020-07-01 0.0 0.0
22 2 Jane 2020-08-01 0.0 0.0
23 2 Jane 2020-09-01 0.0 0.0
24 2 Jane 2020-10-01 0.0 0.0
4 2 Jane 2020-11-01 20.0 1.0
5 2 Jane 2020-12-01 21.0 10.0
25 3 David 2020-01-01 0.0 0.0
26 3 David 2020-02-01 0.0 0.0
27 3 David 2020-03-01 0.0 0.0
28 3 David 2020-04-01 0.0 0.0
29 3 David 2020-05-01 0.0 0.0
30 3 David 2020-06-01 0.0 0.0
6 3 David 2020-07-01 -3.0 2.0
31 3 David 2020-08-01 0.0 0.0
32 3 David 2020-09-01 0.0 0.0
33 3 David 2020-10-01 0.0 0.0
34 3 David 2020-11-01 0.0 0.0
35 3 David 2020-12-01 0.0 0.0