鉴于以下示例性Pandas DataFrame
x
:
a b
2014-08-07 0.1 2.0
2014-08-18 0.2 4.0
2014-12-16 0.3 0.0
2015-01-16 0.4 2.3
2015-02-16 0.5 2.1
2015-03-18 0.6 7.0
索引的类型为datetime.date
。
我想编写一个函数,它接受start
类型的参数datetime.datetime
,这样它就会给我一个小于start
的最大索引。
例如,对于start = datetime.datetime(2015, 1, 20, 17, 30)
,start
小于2015-01-16
的最大索引为a
。
这会在b
和x.loc[dt(2015,1,16)]
.state("app.customers", {
url: "/customers?searchText&pageSize&pageNumber",
params: {
searchText: { value: "", squash: true },
pageSize: { value: 25, squash: true },
pageNumber: { value: 1, squash: true }
},
controller: "customersController as vm",
templateUrl: "customers.html",
resolve: {
customerService: "customerService",
customers: function (customerService, $stateParams) {
if ($stateParams.searchText) {
return customerService.search($stateParams.searchText, parseInt($stateParams.pageSize), parseInt($stateParams.pageNumber));
} else {
//return empty array and default pager
return null;
}
},
}
})
中为我提供最新的更改。
答案 0 :(得分:2)
pandas asof功能就是为了这个:
x.index.asof(start)
它可用于系列索引或日期时间索引。
请参阅:
http://pandas.pydata.org/pandas-docs/version/0.17.0/generated/pandas.DatetimeIndex.asof.html
答案 1 :(得分:1)
测试解决方案:
Out[4]:
a b
2014-08-07 0.1 2.0
2014-08-18 0.2 4.0
2014-12-16 0.3 0.0
2015-01-16 0.4 2.3
2015-02-16 0.5 2.1
2015-03-18 0.6 7.0
In [5]: %timeit df[df.index < pd.to_datetime("2015-09-01")].ix[-1, :]
The slowest run took 5.15 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 620 µs per loop
In [6]: %timeit df.iloc[:df.index.values.searchsorted(np.datetime64("2015-09-01"))].ix[-1, :]
The slowest run took 5.53 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 293 µs per loop
In [7]: %timeit df[:pd.to_datetime("2015-09-01")].ix[-1, :]
The slowest run took 5.66 times longer than the fastest. This could mean that an intermediate result is being cached
1000 loops, best of 3: 450 µs per loop
__main__:6: FutureWarning: TimeSeries is deprecated. Please use Series
In [10]: %timeit alecsolution(df)
1000 loops, best of 3: 503 µs per loop
我认为最快的是:
df.iloc[:df.index.values.searchsorted(np.datetime64("2015-09-01"))].ix[-1, :]
答案 2 :(得分:0)
这是我使用TimeSeries的解决方案,但对于DataFrame是相同的。
基本上它迭代了df,因为每次迭代检查日期是否大于'start',如果没有保存刚刚检查为'previous'的日期,如果是,那么'previous'就是你的结果。
import pandas as pd
import datetime
df = pd.TimeSeries({'2014-08-07': ['0.1', '2.0'],
'2014-08-18': ['0.2', '4.0'],
'2014-12-16': ['0.3', '0.0'],
'2015-01-16': ['0.4', '2.3'],
'2015-02-16': ['0.5', '2.1'],
'2015-03-18': ['0.6', '7.0']})
start = datetime.datetime(2015, 1, 20, 17, 30)
result = False
previous_i = False
for i,row in df.iteritems():
if pd.to_datetime(i) >= start:
result = previous_i
break # you don't need to check further
else:
previous_i = i
print(result)
>>> 2015-01-16
答案 3 :(得分:0)
x[:start.date()].ix[-1, :]
为Pandas Series
提供所需索引中的条目。