我有一个简单的数据框(热带雨量测量团TRMM的数据,以帮助提供背景信息),一列用于日期时间,一列用于降水量测量,如下所示:
ppt
date
1998-01-01 03:00:00 0.00
1998-01-01 06:00:00 0.00
1998-01-01 09:00:00 0.03
1998-01-01 12:00:00 0.20
读数是每三个小时一次,其值是前三个小时每小时平均降雨量的3个小时。我想创建一个数据框,其中包含每小时的降雨量测量值,因此看起来像这样:
ppt
date
1998-01-01 01:00:00 0.00
1998-01-01 02:00:00 0.00
1998-01-01 03:00:00 0.00
1998-01-01 04:00:00 0.00
1998-01-01 05:00:00 0.00
1998-01-01 06:00:00 0.00
1998-01-01 07:00:00 0.03
1998-01-01 08:00:00 0.03
1998-01-01 09:00:00 0.03
1998-01-01 10:00:00 0.20
1998-01-01 11:00:00 0.20
1998-01-01 12:00:00 0.20
关于我可能会如何做的任何想法?
答案 0 :(得分:1)
为了获得以上所述:
# repeated decreasing number of hours
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
d = np.tile(np.arange(3)[::-1], len(df)) * pd.Timedelta(1, unit='H')
# repeat the index 3 times for every entry
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
i = df.index.repeat(3)
df_ = df.loc[i]
# take care of differences
# [3:00, 3:00, 3:00, 6:00, 6:00, 6:00, ...]
# minus
# [2 hr, 1 hr, 0 hr, 2 hr, 1 hr, 0 hr, ...]
# [1:00, 2:00, 3:00, 4:00, 5:00, 6:00, ...]
df_.index -= d
df_
ppt
date
1998-01-01 01:00:00 0.00
1998-01-01 02:00:00 0.00
1998-01-01 03:00:00 0.00
1998-01-01 04:00:00 0.00
1998-01-01 05:00:00 0.00
1998-01-01 06:00:00 0.00
1998-01-01 07:00:00 0.03
1998-01-01 08:00:00 0.03
1998-01-01 09:00:00 0.03
1998-01-01 10:00:00 0.20
1998-01-01 11:00:00 0.20
1998-01-01 12:00:00 0.20
asfreq
和resample
只有这样才能使你
df.asfreq('H').bfill()
ppt
date
1998-01-01 03:00:00 0.00
1998-01-01 04:00:00 0.00
1998-01-01 05:00:00 0.00
1998-01-01 06:00:00 0.00
1998-01-01 07:00:00 0.03
1998-01-01 08:00:00 0.03
1998-01-01 09:00:00 0.03
1998-01-01 10:00:00 0.20
1998-01-01 11:00:00 0.20
1998-01-01 12:00:00 0.20
我们错过了
1998-01-01 01:00:00 0.00
1998-01-01 02:00:00 0.00
开头
答案 1 :(得分:0)
只要正确指定开始时间,就可以在重新填充时使用重新采样:
import pandas as pd
import numpy as np
#specify start and end times so that the range to fill is clear
start = pd.Timestamp('1998-01-01 00:00:00')
end = pd.Timestamp('1998-01-01 12:00:00')
t = np.linspace(start.value, end.value, 5)
t = pd.to_datetime(t)
df=pd.DataFrame(index=t)
#populate existing values
df['ppt']=[0.,0.,0.,0.03,0.2]
#resample and fill backwards
df.resample('1H').bfill()