我有一个如下数据框。
import pandas as pd
frame = pd.DataFrame({"a":range(7),'b':range(7,0,-1),
'id':['one','one','two','two','two','three','four'],
'date':['2019-05-01','2019-05-08','2019-05-01','2019-05-08','2019-05-15','2019-05-01','2019-05-15']})
print(frame)
pd.to_datetime(frame['date'],yearfirst=True)
它看起来像:
0 7 one 2019-05-01
1 6 one 2019-05-08
2 5 two 2019-05-01
3 4 two 2019-05-08
4 3 two 2019-05-15
5 2 three 2019-05-01
6 1 four 2019-05-15
我希望每个id都有三行日期。
预期的数据帧是:
0 7 one 2019-05-01
1 6 one 2019-05-08
1 6 one 2019-05-15
2 5 two 2019-05-01
3 4 two 2019-05-08
4 3 two 2019-05-15
5 2 three 2019-05-01
5 2 three 2019-05-08
5 2 three 2019-05-15
NA NA four 2019-05-01
NA NA four 2019-05-08
6 1 four 2019-05-15
如何通过重采样获得此数据框? 谢谢!
答案 0 :(得分:1)
使用:
frame['date'] = pd.to_datetime(frame['date'],yearfirst=True)
#create MultiIndex by unique values of both columns
mux = pd.MultiIndex.from_product([frame['id'].unique(),
frame['date'].unique()], names=['id','date'])
#add missing rows by reindex and per groups forward filling missing values
frame = (frame.set_index(['id','date'])
.reindex(mux)
.groupby(level=0)
.ffill()
.drop('id', axis=1)
.reset_index()
)
print (frame)
id date a b
0 one 2019-05-01 0.0 7.0
1 one 2019-05-08 1.0 6.0
2 one 2019-05-15 1.0 6.0
3 two 2019-05-01 2.0 5.0
4 two 2019-05-08 3.0 4.0
5 two 2019-05-15 4.0 3.0
6 three 2019-05-01 5.0 2.0
7 three 2019-05-08 5.0 2.0
8 three 2019-05-15 5.0 2.0
9 four 2019-05-01 NaN NaN
10 four 2019-05-08 NaN NaN
11 four 2019-05-15 6.0 1.0