Question

我的数据看起来像（数据类型是Pandas DataFrame）：

```{r testprint1, echo=FALSE}
test_data
```
```{r testprint2, echo=FALSE}
kable(test_data)
```
```{r testprint3, echo=FALSE}
test_data %>% kable("html")
```                  
```{r testprint4, echo=FALSE}
test_data %>% kable("html") %>% kable_styling()
```

我尝试将数据重新采样为二级数据，并将每个数据对齐到不早于原始时间的最近秒数。我希望结果是：

price = 

time                bid
03:03:34.797000     116.02
03:03:34.797000     116.02
03:03:54.152000     116.02
03:03:54.169000     116.02
03:03:54.169000     116.02
03:07:36.899000     116.24
03:07:48.760000     116.24
03:07:48.760000     116.24
03:07:48.761000     116.24

并使用

03:04:00    116.02
03:05:00    NaN
03:06:00    NaN
03:07:00    NaN
03:08:00    116.24

但是我得到了。

price.resample('Min').last()

除了对齐外，一切顺利。有人可以帮我解决问题吗？感谢。

Answer 1

您需要使用floor：

df.groupby(df.index.floor('Min')).last().resample('Min').asfreq()

让我们试试速度（需要Pandas 0.21.0 +）：

df.set_axis(df.index.floor('Min'), axis=0, inplace=False)\
  .drop_duplicates().resample('Min').asfreq()

输出：

             bid
time            
03:03:00  116.02
03:04:00     NaN
03:05:00     NaN
03:06:00     NaN
03:07:00  116.24

Answer 2

(df.groupby(df['time'].dt.round('1min') )['bid'].mean()).asfreq('Min')
Out[45]: 
time
2017-12-06 03:04:00    116.02
2017-12-06 03:05:00       NaN
2017-12-06 03:06:00       NaN
2017-12-06 03:07:00       NaN
2017-12-06 03:08:00    116.24
Freq: T, Name: bid, dtype: float64

Answer 3

我尝试使用此解决方案，运行速度更快。

df = df.resample('Min').last()
offset_mc = df.index[0].microseconds
offset_sec = df.index[0].seconds % 60
if not (offset_mc == 0 and offset_sec == 0): df.index +=  pd.tslib.Timedelta(str(59-offset_sec)+'seconds '+str(1000000-offset_mc)+'microseconds')

如何重新取样并将每个索引舍入到最接近的秒数？

3 个答案: