我试图将时间段的总和均匀分配给较高采样时间段的分量。
我做了什么:
>>> rng = pandas.PeriodIndex(start='2014-01-01', periods=2, freq='W')
>>> ts = pandas.Series([i+1 for i in range(len(rng))], index=rng)
>>> ts
2013-12-30/2014-01-05 1
2014-01-06/2014-01-12 2
Freq: W-SUN, dtype: float64
>>> ts.resample('D')
2013-12-30 1
2013-12-31 NaN
2014-01-01 NaN
2014-01-02 NaN
2014-01-03 NaN
2014-01-04 NaN
2014-01-05 NaN
2014-01-06 2
2014-01-07 NaN
2014-01-08 NaN
2014-01-09 NaN
2014-01-10 NaN
2014-01-11 NaN
2014-01-12 NaN
Freq: D, dtype: float64
我真正想要的是:
>>> ts.resample('D', some_miracle_thing)
2013-12-30 1/7
2013-12-31 1/7
2014-01-01 1/7
2014-01-02 1/7
2014-01-03 1/7
2014-01-04 1/7
2014-01-05 1/7
2014-01-06 2/7
2014-01-07 2/7
2014-01-08 2/7
2014-01-09 2/7
2014-01-10 2/7
2014-01-11 2/7
2014-01-12 2/7
Freq: D, dtype: float64
有办法吗
x/7
lambda函数?答案 0 :(得分:4)
有点费解,但这有用吗?
首先,重新取样时,添加.groupby(level=0)
以保留原始时间戳。 (基于此answer)
rs = ts.groupby(level=0).resample('D')
然后在MultiIndex的第一级应用groupby以应用所需的操作。
In [285]: rs.groupby(level=0).transform(lambda x: x.iloc[0] / float(len(x)))
Out[285]:
2013-12-30/2014-01-05 2013-12-30 0.142857
2013-12-31 0.142857
2014-01-01 0.142857
2014-01-02 0.142857
2014-01-03 0.142857
2014-01-04 0.142857
2014-01-05 0.142857
2014-01-06/2014-01-12 2014-01-06 0.285714
2014-01-07 0.285714
2014-01-08 0.285714
2014-01-09 0.285714
2014-01-10 0.285714
2014-01-11 0.285714
2014-01-12 0.285714
dtype: float64
答案 1 :(得分:1)
这很有效,但我发现它很难看:
>>> rs = ts.resample('D', fill_method="pad")
>>> rs/7
2013-12-30 0.142857
2013-12-31 0.142857
2014-01-01 0.142857
2014-01-02 0.142857
2014-01-03 0.142857
2014-01-04 0.142857
2014-01-05 0.142857
2014-01-06 0.285714
2014-01-07 0.285714
2014-01-08 0.285714
2014-01-09 0.285714
2014-01-10 0.285714
2014-01-11 0.285714
2014-01-12 0.285714
Freq: D, dtype: float64
这个基本功能没有内部功能吗?
答案 2 :(得分:0)
我讨厌这种解决方案,但是当您不确定新间隔的数量时,它可以用于上采样。从一周到一天很容易,通常是每周7天。但是我发现基于上采样的间隔数通常是未知的-此解决方案适用于此。
这个想法是将重采样后间隔的数量放入初始的预重采样数据帧中,然后进行重采样并将数据除以间隔计数。旁注-这是一个数据框,而不是序列。
# Create unique group IDs by simply using the existing index (Assumes an integer, non-duplicated index)
df['group'] = df.index
# Get the count of intervals for each post-resampled timestamp.
df['count'] = df.set_index('timestamp').resample('15min').ffill()['group'].value_counts()
# Resample all data again and fill so that the count is now included in every row.
df = df.set_index('timestamp').resample('15min').ffill()
# Apply the division on the entire dataframe and clean up.
df = df.div(df['count'], axis = 0).reset_index().drop(['group','count'], axis = 1)
我将其包装在一个函数中并塞进去,这样我就不必再用类似的东西来查看它了:
def distribute_upsample(df, index, freq)
其中index
为'timestamp'
,freq
为'15min'