需要pandas DataFrame.resample()来表示子句点系列的开始日期时间

时间:2014-02-24 22:58:09

标签: python pandas

我正在使用pandas DataFrame.resample()函数将1分钟频率时间序列数据下采样到15分钟频率。原始数据由对应于相同分钟频率的多个时间序列组成,其中每个序列是元组列表,每个元组定义为(<offset from start time>, <value>)。我在填充DataFrame之前将其转换为(<datetime>, <value>)。这是一个时间序列示例

start = datetime(2014, 2, 24, 1, 6, 0, tzinfo=pytz.utc)
min_ts = dict((start + timedelta(seconds=60) * t, random.randint(0,3)) for t in range(1, 30))

 min_ts = 
 {datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>): 2,
 datetime.datetime(2014, 2, 24, 1, 8, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 9, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 10, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 11, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 12, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 13, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 14, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 15, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 16, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 17, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 18, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 19, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 20, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 21, tzinfo=<UTC>): 2,
 datetime.datetime(2014, 2, 24, 1, 22, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 23, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 24, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 25, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 26, tzinfo=<UTC>): 1,
 datetime.datetime(2014, 2, 24, 1, 27, tzinfo=<UTC>): 2,
 datetime.datetime(2014, 2, 24, 1, 28, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 29, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 30, tzinfo=<UTC>): 2,
 datetime.datetime(2014, 2, 24, 1, 31, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 32, tzinfo=<UTC>): 0,
 datetime.datetime(2014, 2, 24, 1, 33, tzinfo=<UTC>): 3,
 datetime.datetime(2014, 2, 24, 1, 34, tzinfo=<UTC>): 2,
 datetime.datetime(2014, 2, 24, 1, 35, tzinfo=<UTC>): 0}

我遇到的问题是,当我将其加载到DataFrame并以15分钟的频率运行重新采样时,将两者之间的值相加,DateTimeIndex标签被强制为小时内15分钟(即0,15) ,30,45)但我想要的是保留原始时间序列DateTimeIndex(即从datetime.datetime(2014, 2, 24, 1, 7, tzinfo=<UTC>)开始)。我尝试使用重新采样loffset配置参数,这会影响DateTimeIndex上的首选行为,但总和值不会相应更改。

df = pd.DataFrame({'values': min_ts})
df.resample('15min', how='sum', label='right')

df = 
DateTimeIndex                  values
--------------------------------------
2014-02-24 01:15:00+00:00    11
2014-02-24 01:30:00+00:00    31
2014-02-24 01:45:00+00:00    11

我想要的结果是什么

df = 
DateTimeIndex                  values
--------------------------------------
2014-02-24 01:07:00+00:00    23
2014-02-24 01:22:00+00:00    21

(更新以更清楚地反映所需结果)

1 个答案:

答案 0 :(得分:1)

尝试使用baseloffset和/或将标签切换为left(这会使用与您不同的随机种子)。

In [17]: df.resample('15min', how='sum', label='right')
Out[17]: 
                           values
2014-02-24 01:15:00+00:00      10
2014-02-24 01:30:00+00:00      17
2014-02-24 01:45:00+00:00       7

[3 rows x 1 columns]

In [18]: df.resample('15min', how='sum', label='right',base=7)
Out[18]: 
                           values
2014-02-24 01:22:00+00:00      16
2014-02-24 01:37:00+00:00      18

[2 rows x 1 columns]

In [19]: df.resample('15min', how='sum', label='left',base=7)
Out[19]: 
                           values
2014-02-24 01:07:00+00:00      16
2014-02-24 01:22:00+00:00      18

[2 rows x 1 columns]

In [21]: df.resample('15min', how='sum', label='right',loffset='7T')
Out[21]: 
                           values
2014-02-24 01:22:00+00:00      10
2014-02-24 01:37:00+00:00      17
2014-02-24 01:52:00+00:00       7

[3 rows x 1 columns]

In [22]: df.resample('15min', how='sum', label='left',loffset='7T')
Out[22]: 
                           values
2014-02-24 01:07:00+00:00      10
2014-02-24 01:22:00+00:00      17
2014-02-24 01:37:00+00:00       7

[3 rows x 1 columns]