我遇到类似的问题(dataframe by index and by integer)
我想要的是通过布尔索引(简单)获取DataFrame的一部分并向后看几个值,比如前一个索引,可能还有一些。不幸的是,与get_loc
相关联的问题中的建议答案会导致我的代码片段阻塞(在以下代码段中输入错误),然后才能获得实际的整数位置。
采用与其他问题的答案相同的例子,这是我尝试的:
df = pd.DataFrame(index=pd.date_range(start=dt.datetime(2015,1,1), end=dt.datetime(2015,2,1)), data={'a':np.arange(32)})
df.index.get_loc(df.index[df['a'] == 1])
*** TypeError: Cannot convert input to TimeStamp
上一个答案使用get_loc
的字符串,我只想传递一个普通的索引值(这里是一个DateTime)
答案 0 :(得分:2)
Using a slice:
import numpy as np
import pandas as pd
import datetime as DT
index = pd.date_range(start=DT.datetime(2015,1,1), end=DT.datetime(2015,2,1))
df = pd.DataFrame({'a':np.arange(len(index))}, index=index)
mask = df['a'] == 1
idx = np.flatnonzero(mask)[0]
lookback = 3
print(df.iloc[max(idx-lookback, 0):idx+1])
yields
a
2015-01-08 7
2015-01-09 8
2015-01-10 9
2015-01-11 10
Note that if idx-lookback
is negative, then the index refers to elements near the tail of df
, just like with Python lists:
In [163]: df.iloc[-3:2]
Out[163]:
Empty DataFrame
Columns: [a]
Index: []
In [164]: df.iloc[0:2]
Out[164]:
a
2015-01-01 0
2015-01-02 1
Thus, to grab elements relative to the head of df
, use max(idx-lookback, 0)
.
Using a boolean mask:
As you know, if you have a boolean array or boolean Series such as
mask = df['a'] == 10
you can select the corresponding rows with
df.loc[mask]
If you wish to select previous or succeeding rows shifted by a fixed amount, you could use mask.shift
to shift the mask:
df.loc[mask.shift(-lookback).fillna(False)]
If you wish to select lookback
preceeding rows, then you could expand the mask by unioning it with its shifts:
lookback = 3
for i in range(1, lookback):
mask |= mask.shift(-i)
or, equivalently, use cumsum
:
mask = (mask.shift(-lookback) - mask.shift(1)).cumsum().fillna(False).astype(bool)
The for-loop
is clearer, but the cumsum expression is faster, particularly if lookback
is large.
For example,
import numpy as np
import pandas as pd
import datetime as DT
df = pd.DataFrame(
index=pd.date_range(start=DT.datetime(2015,1,1), end=DT.datetime(2015,2,1)),
data={'a':np.arange(32)})
mask = df['a'] == 10
lookback = 3
for i in range(1, lookback):
mask |= mask.shift(-i)
# alternatively,
# mask = (mask.shift(-lookback) - mask.shift(1)).cumsum().fillna(False).astype(bool)
print(df.loc[mask])
yields
a
2015-01-08 7
2015-01-09 8
2015-01-10 9
2015-01-11 10