从几乎每周到每天重新采样熊猫数据框

时间:2021-03-19 08:53:46

标签: python pandas dataframe pandas-resample

重新采样此数据框的最简洁方法是什么:

>>> uneven = pd.DataFrame({'a': [0, 12, 19]}, index=pd.DatetimeIndex(['2020-12-08', '2020-12-20', '2020-12-27']))
>>> print(uneven)
             a
2020-12-08   0
2020-12-20  12
2020-12-27  19

...进入这个数据框:

>>> daily = pd.DataFrame({'a': range(20)}, index=pd.date_range('2020-12-08', periods=3*7-1, freq='D'))
>>> print(daily)
             a
2020-12-08   0
2020-12-09   1
...
2020-12-19  11
2020-12-20  12
2020-12-21  13
...
2020-12-27  19

注意:12 月 8 日至 20 日之间的 12 天,20 日至 27 日之间的 7 天。

另外,为了明确我想要做的插值/重采样类型:

>>> print(daily.diff())
              a
2020-12-08  NaN
2020-12-09  1.0
2020-12-10  1.0
...
2020-12-19  1.0
2020-12-20  1.0
2020-12-21  1.0
...
2020-12-27  1.0

实际数据是分层的并且有多个列,但我想从一些我能理解的东西开始:

                      first_dose  second_dose
date       areaCode                          
2020-12-08 E92000001         0.0          0.0
           N92000002         0.0          0.0
           S92000003         0.0          0.0
           W92000004         0.0          0.0
2020-12-20 E92000001    574829.0          0.0
           N92000002     16068.0          0.0
           S92000003     60333.0          0.0
           W92000004     24056.0          0.0
2020-12-27 E92000001    267809.0          0.0
           N92000002     14948.0          0.0
           S92000003     34535.0          0.0
           W92000004     12495.0          0.0
2021-01-03 E92000001    330037.0      20660.0
           N92000002      9669.0       1271.0
           S92000003     21446.0         44.0
           W92000004     14205.0         27.0

1 个答案:

答案 0 :(得分:1)

我认为你需要:

df = df.reset_index('areaCode').groupby('areaCode')[['first_dose','second_dose']].resample('D').interpolate()
print (df)
                         first_dose  second_dose
areaCode  date                                  
E92000001 2020-12-08       0.000000     0.000000
          2020-12-09   47902.416667     0.000000
          2020-12-10   95804.833333     0.000000
          2020-12-11  143707.250000     0.000000
          2020-12-12  191609.666667     0.000000
                            ...          ...
W92000004 2020-12-30   13227.857143    11.571429
          2020-12-31   13472.142857    15.428571
          2021-01-01   13716.428571    19.285714
          2021-01-02   13960.714286    23.142857
          2021-01-03   14205.000000    27.000000

[108 rows x 2 columns]