我正在使用pandas DataFrame.resample()函数将1分钟频率时间序列数据下采样到15分钟频率。原始数据由对应于相同分钟频率的多个时间序列组成,其中每个序列是元组列表,每个元组定义为(<offset from start time>, <value>)
。我在填充DataFrame之前将其转换为(<datetime>, <value>)
。这是一个时间序列示例
start = datetime(2014, 2, 24, 1, 6, 0, tzinfo=pytz.utc)
min_ts = dict((start + timedelta(seconds=60) * t, random.randint(0,3)) for t in range(1, 30))
min_ts =
{datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>): 2,
datetime.datetime(2014, 2, 24, 1, 8, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 9, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 10, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 11, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 12, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 13, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 14, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 15, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 16, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 17, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 18, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 19, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 20, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 21, tzinfo=<UTC>): 2,
datetime.datetime(2014, 2, 24, 1, 22, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 23, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 24, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 25, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 26, tzinfo=<UTC>): 1,
datetime.datetime(2014, 2, 24, 1, 27, tzinfo=<UTC>): 2,
datetime.datetime(2014, 2, 24, 1, 28, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 29, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 30, tzinfo=<UTC>): 2,
datetime.datetime(2014, 2, 24, 1, 31, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 32, tzinfo=<UTC>): 0,
datetime.datetime(2014, 2, 24, 1, 33, tzinfo=<UTC>): 3,
datetime.datetime(2014, 2, 24, 1, 34, tzinfo=<UTC>): 2,
datetime.datetime(2014, 2, 24, 1, 35, tzinfo=<UTC>): 0}
我遇到的问题是,当我将其加载到DataFrame并以15分钟的频率运行重新采样时,将两者之间的值相加,DateTimeIndex标签被强制为小时内15分钟(即0,15) ,30,45)但我想要的是保留原始时间序列DateTimeIndex(即从datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>)
开始)。我尝试使用重新采样loffset
配置参数,这会影响DateTimeIndex上的首选行为,但总和值不会相应更改。
df = pd.DataFrame({'values': min_ts})
df.resample('15min', how='sum', label='right')
df =
DateTimeIndex values
--------------------------------------
2014-02-24 01:15:00+00:00 11
2014-02-24 01:30:00+00:00 31
2014-02-24 01:45:00+00:00 11
我想要的结果是什么
df =
DateTimeIndex values
--------------------------------------
2014-02-24 01:07:00+00:00 23
2014-02-24 01:22:00+00:00 21
(更新以更清楚地反映所需结果)
答案 0 :(得分:1)
尝试使用base
,loffset
和/或将标签切换为left
(这会使用与您不同的随机种子)。
In [17]: df.resample('15min', how='sum', label='right')
Out[17]:
values
2014-02-24 01:15:00+00:00 10
2014-02-24 01:30:00+00:00 17
2014-02-24 01:45:00+00:00 7
[3 rows x 1 columns]
In [18]: df.resample('15min', how='sum', label='right',base=7)
Out[18]:
values
2014-02-24 01:22:00+00:00 16
2014-02-24 01:37:00+00:00 18
[2 rows x 1 columns]
In [19]: df.resample('15min', how='sum', label='left',base=7)
Out[19]:
values
2014-02-24 01:07:00+00:00 16
2014-02-24 01:22:00+00:00 18
[2 rows x 1 columns]
In [21]: df.resample('15min', how='sum', label='right',loffset='7T')
Out[21]:
values
2014-02-24 01:22:00+00:00 10
2014-02-24 01:37:00+00:00 17
2014-02-24 01:52:00+00:00 7
[3 rows x 1 columns]
In [22]: df.resample('15min', how='sum', label='left',loffset='7T')
Out[22]:
values
2014-02-24 01:07:00+00:00 10
2014-02-24 01:22:00+00:00 17
2014-02-24 01:37:00+00:00 7
[3 rows x 1 columns]