如何使用Python重新采样

时间:2019-05-23 12:01:49

标签: python pandas

我有一个如下数据框。

import pandas as pd

frame = pd.DataFrame({"a":range(7),'b':range(7,0,-1),
'id':['one','one','two','two','two','three','four'],
'date':['2019-05-01','2019-05-08','2019-05-01','2019-05-08','2019-05-15','2019-05-01','2019-05-15']})
print(frame)
pd.to_datetime(frame['date'],yearfirst=True)

它看起来像:

0  7    one  2019-05-01
1  6    one  2019-05-08
2  5    two  2019-05-01
3  4    two  2019-05-08
4  3    two  2019-05-15
5  2  three  2019-05-01
6  1   four  2019-05-15

我希望每个id都有三行日期。

预期的数据帧是:

0  7    one  2019-05-01
1  6    one  2019-05-08
1  6    one  2019-05-15
2  5    two  2019-05-01
3  4    two  2019-05-08
4  3    two  2019-05-15
5  2  three  2019-05-01
5  2  three  2019-05-08
5  2  three  2019-05-15
NA NA   four  2019-05-01
NA NA   four  2019-05-08
6 1   four  2019-05-15

如何通过重采样获得此数据框? 谢谢!

1 个答案:

答案 0 :(得分:1)

使用:

frame['date'] = pd.to_datetime(frame['date'],yearfirst=True)

#create MultiIndex by unique values of both columns
mux = pd.MultiIndex.from_product([frame['id'].unique(), 
                                  frame['date'].unique()], names=['id','date'])

#add missing rows by reindex and per groups forward filling missing values
frame = (frame.set_index(['id','date'])
              .reindex(mux)
              .groupby(level=0)
              .ffill()
              .drop('id', axis=1)
              .reset_index()
              )

print (frame)
       id       date    a    b
0     one 2019-05-01  0.0  7.0
1     one 2019-05-08  1.0  6.0
2     one 2019-05-15  1.0  6.0
3     two 2019-05-01  2.0  5.0
4     two 2019-05-08  3.0  4.0
5     two 2019-05-15  4.0  3.0
6   three 2019-05-01  5.0  2.0
7   three 2019-05-08  5.0  2.0
8   three 2019-05-15  5.0  2.0
9    four 2019-05-01  NaN  NaN
10   four 2019-05-08  NaN  NaN
11   four 2019-05-15  6.0  1.0