Question

我有一个带有刻度数据的pandas df，索引为datetime64[ns]我想将此数据重新采样为5分钟，如下所示：price_5min = price.price.resample('5T').ohlc().between_time('09:00:00, '16:20:00')

它有效，但它会为新的时间序列增加周末和假期，我需要将其删除。我没有关注美国（或任何其他标准假期日历）。我只想删除不在原始price df中的日子。

索引不是唯一的，很多情况都是相同的时间戳。熊猫版本0.20.1

我尝试过：

1）dropna（）：我有需要填充的NAN行，所以这不起作用。

2）price.index.difference（price_5min.index）：给我所有的刻度日期，而不是日期。

3）price.index.date.difference（price_5min.index.date）：不能用作index.date是一个numpy.ndarray

4）价格！= price_5min：错误：只能比较标记相同的DataFrame对象

5）price.index！= price_4min.index：错误：长度必须匹配才能比较

解决我问题的建议逻辑：

a）以某种方式比较两个数据帧中的日期和基于此的删除，但是如何？

b）删除没有差异的天数，但是如何？

c）很明显我没想过（最有可能......）

df价格如下：

                     price  quantity
time                                
2016-06-15 16:19:20  29.85     429.6
2016-06-15 16:19:20  29.85      65.6
2016-06-15 16:19:20  29.85    1351.4
2016-06-15 16:19:30  29.70     729.4
2016-06-15 16:19:30  29.70     287.0
2016-06-15 16:19:30  29.70     219.4
2016-06-15 16:19:49  29.70      47.4
2016-06-15 16:19:52  29.70      11.8
2016-06-16 09:01:42  29.05     350.0
2016-06-16 09:01:42  29.10     189.4
2016-06-16 09:01:45  29.05      33.6
2016-06-16 09:01:54  29.05      33.6
...

Answer 1

我认为您可以使用np.setdiff1d和numpy.in1d并按boolean indexing过滤：

diffs = np.setdiff1d(price_5min.index.date, price.index.date))
df = price_5min[~np.in1d(price_5min.index.date, diffs]

DatetimeIndex.floor或to_period的其他解决方案：

dates = price.index.floor('D')
dates_5min = price_5min.index.floor('D')
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

dates = price.index.to_period('D')
dates_5min = price_5min.index.to_period('D')
df = price_5min[~dates_5min.isin(dates_5min.difference(dates))]

Answer 2

快速解决方案：

price_5min=price.groupby(price.index.Date).resample('5T').ohlc()
price_5min.index = price_5min.index.droplevel(0)

如何删除pandas resample添加的额外天数？

2 个答案: