根据列值检索熊猫数据框的过去n条记录

时间:2018-08-10 08:09:31

标签: python-3.x pandas

我下面有一个熊猫数据框df

         pay        pay2        date
0  209.070007  208.000000   2018-08-06
1  207.110001  209.320007   2018-08-07
2  207.250000  206.050003   2018-08-08
3  208.880005  207.279999   2018-08-09

鉴于有last_date,我想从n开始检索过去的last_date行,包括last_date本身的行。

例如,给定last_date = '2018-08-08'n=3。结果数据框应如下所示;

         pay        pay2        date
0  209.070007  208.000000   2018-08-06
1  207.110001  209.320007   2018-08-07
2  207.250000  206.050003   2018-08-08

3 个答案:

答案 0 :(得分:2)

使用

In [285]: s = df.date.eq(last_date)

In [286]: loc = s.index[s][-1]    # or s.idxmax()

In [287]: df.loc[loc-3:loc]
Out[287]:
          pay        pay2        date
0  209.070007  208.000000  2018-08-06
1  207.110001  209.320007  2018-08-07
2  207.250000  206.050003  2018-08-08

详细信息

In [288]: s
Out[288]:
0    False
1    False
2     True
3    False
Name: date, dtype: bool

In [289]: loc
Out[289]: 2

答案 1 :(得分:2)

我会做的简短些。首先将字符串转换为日期时间。

last_date = '2018-08-08'
last_date = pd.to_datetime(last_date)

n = 2
df.loc[(df['date'] <= last_date)].loc[:n-1]

#           pay        pay2       date
# 0  209.070007  208.000000 2018-08-06
# 1  207.110001  209.320007 2018-08-07

答案 2 :(得分:1)

您需要:

df = pd.DataFrame({'pay':['209.07','207.110001','207.250000','208.880005'],
                   'pay2':['208','209.320007','206.050003','207.279999'],
                    'date':['2018-08-06','2018-08-07','2018-08-08','2018-08-09']})

last_date = pd.to_datetime('2018-08-08')
n= 3

df['date'] =pd.to_datetime(df['date'])
df_new = df[df['date']<=last_date].sort_values("date", ascending=False)

df_new = df_new[:n].sort_values("date", ascending=True)

print(df_new)

输出:

          pay        pay2       date
0      209.07         208 2018-08-06
1  207.110001  209.320007 2018-08-07
2  207.250000  206.050003 2018-08-08