重新取样后缺少第一行

时间:2013-09-27 06:24:02

标签: pandas

当我重新采样某些数据时,我遇到了丢弃第一行的pandas的问题。请参阅下面的示例。请注意,如果您将最后一个时间戳向前推进1秒,它将按预期工作。

我正在使用pandas 0.10.1

import pandas as pd

from datetime import datetime
from StringIO import StringIO


f = StringIO('''\
time,value
2011-06-03 00:00:05,0
2011-06-03 00:01:05,1
2011-06-03 00:02:05,2
''')

series = pd.read_csv(f, parse_dates=True, index_col=0)['value']

print series
# time
# 2011-06-03 00:00:05    0
# 2011-06-03 00:01:05    1
# 2011-06-03 00:02:05    2
# Name: value

# Problem resampling: 1st sample is missing

print series.resample('s')
# time
# 2011-06-03 00:00:06   NaN
# 2011-06-03 00:00:07   NaN
# 2011-06-03 00:00:08   NaN
# 2011-06-03 00:00:09   NaN
# ...
# 2011-06-03 00:01:52   NaN
# 2011-06-03 00:02:03   NaN
# 2011-06-03 00:02:04   NaN
# 2011-06-03 00:02:05     2
# 2011-06-03 00:02:06   NaN
# Freq: S, Name: value, Length: 121

1 个答案:

答案 0 :(得分:0)

已关闭的parm的默认值在0.11中更改,请参阅here。我不知道那里是否还有一个bug。您可以尝试指定关闭的间隔。

目前的熊猫版本为0.12(即将推出0.13)。最好的办法是升级。

从0.12开始。看起来不错。默认为关闭='左'

In [11]: df
Out[11]: 
                     value
time                      
2011-06-03 00:00:05      0
2011-06-03 00:01:05      1
2011-06-03 00:02:05      2

In [12]: df.index
Out[12]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-06-03 00:00:05, ..., 2011-06-03 00:02:05]
Length: 3, Freq: None, Timezone: None

In [13]: df.resample('1s')
Out[13]: 
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 121 entries, 2011-06-03 00:00:05 to 2011-06-03 00:02:05
Freq: S
Data columns (total 1 columns):
value    3  non-null values
dtypes: float64(1)

In [14]: df.resample('1s').head()
Out[14]: 
                     value
time                      
2011-06-03 00:00:05      0
2011-06-03 00:00:06    NaN
2011-06-03 00:00:07    NaN
2011-06-03 00:00:08    NaN
2011-06-03 00:00:09    NaN