Question

鉴于以下示例性Pandas DataFrame x：

             a    b
2014-08-07   0.1  2.0
2014-08-18   0.2  4.0
2014-12-16   0.3  0.0
2015-01-16   0.4  2.3
2015-02-16   0.5  2.1
2015-03-18   0.6  7.0

索引的类型为datetime.date。

我想编写一个函数，它接受start类型的参数datetime.datetime，这样它就会给我一个小于start的最大索引。

例如，对于start = datetime.datetime(2015, 1, 20, 17, 30)，start小于2015-01-16的最大索引为a。

这会在b和x.loc[dt(2015,1,16)] .state("app.customers", { url: "/customers?searchText&pageSize&pageNumber", params: { searchText: { value: "", squash: true }, pageSize: { value: 25, squash: true }, pageNumber: { value: 1, squash: true } }, controller: "customersController as vm", templateUrl: "customers.html", resolve: { customerService: "customerService", customers: function (customerService, $stateParams) { if ($stateParams.searchText) { return customerService.search($stateParams.searchText, parseInt($stateParams.pageSize), parseInt($stateParams.pageNumber)); } else { //return empty array and default pager return null; } }, } })中为我提供最新的更改。

Answer 1

pandas asof功能就是为了这个：

x.index.asof(start)

它可用于系列索引或日期时间索引。

请参阅：

http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DatetimeIndex.asof.html

Answer 2

测试解决方案：

Out[4]: 
              a    b
2014-08-07  0.1  2.0
2014-08-18  0.2  4.0
2014-12-16  0.3  0.0
2015-01-16  0.4  2.3
2015-02-16  0.5  2.1
2015-03-18  0.6  7.0

In [5]: %timeit df[df.index < pd.to_datetime("2015-09-01")].ix[-1, :]
The slowest run took 5.15 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 620 µs per loop

In [6]: %timeit df.iloc[:df.index.values.searchsorted(np.datetime64("2015-09-01"))].ix[-1, :]
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 293 µs per loop

In [7]: %timeit df[:pd.to_datetime("2015-09-01")].ix[-1, :]
The slowest run took 5.66 times longer than the fastest. This could mean that an intermediate result is being cached 
1000 loops, best of 3: 450 µs per loop

__main__:6: FutureWarning: TimeSeries is deprecated. Please use Series
In [10]: %timeit alecsolution(df)
1000 loops, best of 3: 503 µs per loop

我认为最快的是：

df.iloc[:df.index.values.searchsorted(np.datetime64("2015-09-01"))].ix[-1, :]

Answer 3

这是我使用TimeSeries的解决方案，但对于DataFrame是相同的。

基本上它迭代了df，因为每次迭代检查日期是否大于'start'，如果没有保存刚刚检查为'previous'的日期，如果是，那么'previous'就是你的结果。

import pandas as pd
import datetime

df = pd.TimeSeries({'2014-08-07': ['0.1', '2.0'],
                    '2014-08-18': ['0.2', '4.0'],
                    '2014-12-16': ['0.3', '0.0'],
                    '2015-01-16': ['0.4', '2.3'],
                    '2015-02-16': ['0.5', '2.1'],
                    '2015-03-18': ['0.6', '7.0']})

start = datetime.datetime(2015, 1, 20, 17, 30)
result = False
previous_i = False

for i,row in df.iteritems():
    if pd.to_datetime(i) >= start:
        result = previous_i
        break # you don't need to check further
    else:
        previous_i = i

print(result)


>>> 2015-01-16

Answer 4

x[:start.date()].ix[-1, :]

为Pandas Series提供所需索引中的条目。

熊猫：最大的指数小于约会

4 个答案: