如何获取Pandas DataFrame的最近5个月的数据?

时间:2017-04-28 11:09:40

标签: python pandas dataframe

我有一个Pandas数据框,它是使用QPython从KDB数据库中提取数据生成的。

首先,Date列以奇怪的dtype形式返回:dtype('<M8[ns]')

df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')

但是,当我检查列的内容时,底行显示dtype为datetime。

0      2017-04-17
1      2017-04-13
2      2017-04-12
3      2017-04-11
4      2017-04-10
5      2017-04-07
6      2017-04-06
7      2017-04-05
8      2017-04-04
9      2017-04-03
10     2017-03-31
11     2017-03-30
          ...    

3180   2004-08-27
3181   2004-08-26
3182   2004-08-25
3183   2004-08-24
3184   2004-08-23
3185   2004-08-20
3186   2004-08-19
Name: Date, dtype: datetime64[ns]

此外,方法last()无法正常工作。我要求过去5个月的数据,但所有数据都会被返回。

# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")

如何获取此DataFrame的最后一行?

3 个答案:

答案 0 :(得分:3)

它适合我。

演示:

In [71]: from pandas_datareader import data as web

In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')

In [73]: df
Out[73]:
                  Open        High         Low       Close     Volume   Adj Close
Date
2010-04-01  237.410000  238.730003  232.750000  235.969994  150786300   30.572166
2010-04-05  234.980011  238.509998  234.769993  238.489998  171126900   30.898657
2010-04-06  238.200005  240.239998  237.000004  239.540009  111754300   31.034696
2010-04-07  239.549995  241.920010  238.659988  240.600006  157125500   31.172029
2010-04-08  240.440002  241.540001  238.040001  239.950005  143247300   31.087815
2010-04-09  241.430012  241.889996  240.460003  241.789993   83545700   31.326203
2010-04-12  242.199989  243.069996  241.809994  242.290005   83256600   31.390984
2010-04-13  241.860008  242.800003  241.110004  242.430008   76552700   31.409123
2010-04-14  245.280006  245.810005  244.069992  245.690002  101019100   31.831486
2010-04-15  245.779991  249.029999  245.509998  248.920010   94196200   32.249965
...                ...         ...         ...         ...        ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003   17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002   16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997   14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993   17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002   23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004   17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999   17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999   18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993   19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993   14106100  143.789993

[1781 rows x 6 columns]

In [74]: df.last('5M')
Out[74]:
                  Open        High         Low       Close    Volume   Adj Close
Date
2016-12-01  110.370003  110.940002  109.029999  109.489998  37086900  109.017344
2016-12-02  109.169998  110.089996  108.849998  109.900002  26528000  109.425578
2016-12-05  110.000000  110.029999  108.250000  109.110001  34324500  108.638987
2016-12-06  109.500000  110.360001  109.190002  109.949997  26195500  109.475358
2016-12-07  109.260002  111.190002  109.160004  111.029999  29998700  110.550697
2016-12-08  110.860001  112.430000  110.599998  112.120003  27068300  111.635996
2016-12-09  112.309998  114.699997  112.309998  113.949997  34402600  113.458090
2016-12-12  113.290001  115.000000  112.489998  113.300003  26374400  112.810902
2016-12-13  113.839996  115.919998  113.750000  115.190002  43733800  114.692743
2016-12-14  115.040001  116.199997  114.980003  115.190002  34031800  114.692743
...                ...         ...         ...         ...       ...         ...
2017-04-13  141.910004  142.380005  141.050003  141.050003  17652900  141.050003
2017-04-17  141.479996  141.880005  140.869995  141.830002  16424000  141.830002
2017-04-18  141.410004  142.039993  141.110001  141.199997  14660800  141.199997
2017-04-19  141.880005  142.000000  140.449997  140.679993  17271300  140.679993
2017-04-20  141.220001  142.919998  141.160004  142.440002  23251100  142.440002
2017-04-21  142.440002  142.679993  141.850006  142.270004  17245200  142.270004
2017-04-24  143.500000  143.949997  143.179993  143.639999  17099200  143.639999
2017-04-25  143.910004  144.899994  143.869995  144.529999  18290300  144.529999
2017-04-26  144.470001  144.600006  143.380005  143.679993  19769400  143.679993
2017-04-27  143.919998  144.160004  143.309998  143.789993  14106100  143.789993

[101 rows x 6 columns]

答案 1 :(得分:2)

解决了它。问题是KDB返回的数据按DESC顺序排序,这使方法last()混乱。

解决方案是在查询中添加一个排序子句(在Q语言中,它带有backtick followed by the keyword xasc

df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
     .last("5M")

或者,要对Pandas数据帧本身中的数据进行排序。

df_sorted = stocktable.dataframe() \
    .sort_values(by="Date",ascending=True) \
    .set_index("Date")
    .last("5M")

答案 2 :(得分:0)

对我来说它很好用:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
print (df)
        Date  a
0 2017-04-03  0
1 2017-04-23  1
2 2017-05-13  2
3 2017-06-02  3
4 2017-06-22  4
5 2017-07-12  5
6 2017-08-01  6
7 2017-08-21  7
8 2017-09-10  8
9 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9

它也适用于重复项,只需要排序DateTime列:

rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True).sort_values('Date')
print (df)
         Date  a
0  2017-04-03  0
10 2017-04-03  0
1  2017-04-23  1
11 2017-04-23  1
2  2017-05-13  2
12 2017-05-13  2
3  2017-06-02  3
13 2017-06-02  3
4  2017-06-22  4
14 2017-06-22  4
5  2017-07-12  5
15 2017-07-12  5
6  2017-08-01  6
16 2017-08-01  6
17 2017-08-21  7
7  2017-08-21  7
18 2017-09-10  8
8  2017-09-10  8
9  2017-09-30  9
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-05-13  2
2017-06-02  3
2017-06-02  3
2017-06-22  4
2017-06-22  4
2017-07-12  5
2017-07-12  5
2017-08-01  6
2017-08-01  6
2017-08-21  7
2017-08-21  7
2017-09-10  8
2017-09-10  8
2017-09-30  9
2017-09-30  9
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})  
df = pd.concat([df,df], ignore_index=True)
print (df)
         Date  a
0  2017-04-03  0
1  2017-04-23  1
2  2017-05-13  2
3  2017-06-02  3
4  2017-06-22  4
5  2017-07-12  5
6  2017-08-01  6
7  2017-08-21  7
8  2017-09-10  8
9  2017-09-30  9
10 2017-04-03  0
11 2017-04-23  1
12 2017-05-13  2
13 2017-06-02  3
14 2017-06-22  4
15 2017-07-12  5
16 2017-08-01  6
17 2017-08-21  7
18 2017-09-10  8
19 2017-09-30  9

df = df.set_index('Date').last('5M')
print (df)
            a
Date         
2017-05-13  2
2017-06-02  3
2017-06-22  4
2017-07-12  5
2017-08-01  6
2017-08-21  7
2017-09-10  8
2017-09-30  9