我有一个Pandas数据框,它是使用QPython从KDB数据库中提取数据生成的。
首先,Date列以奇怪的dtype形式返回:dtype('<M8[ns]')
df = conn.sync("select Date, Open, High, Low, Close from stocktable", pandas=True)
df["Date"].dtype
# dtype('<M8[ns]')
但是,当我检查列的内容时,底行显示dtype为datetime。
0 2017-04-17
1 2017-04-13
2 2017-04-12
3 2017-04-11
4 2017-04-10
5 2017-04-07
6 2017-04-06
7 2017-04-05
8 2017-04-04
9 2017-04-03
10 2017-03-31
11 2017-03-30
...
3180 2004-08-27
3181 2004-08-26
3182 2004-08-25
3183 2004-08-24
3184 2004-08-23
3185 2004-08-20
3186 2004-08-19
Name: Date, dtype: datetime64[ns]
此外,方法last()
无法正常工作。我要求过去5个月的数据,但所有数据都会被返回。
# Expected to only return last 5 months of data, but returns it all.
df.set_index("Date").last("5M")
如何获取此DataFrame的最后一行?
答案 0 :(得分:3)
它适合我。
演示:
In [71]: from pandas_datareader import data as web
In [72]: df = web.DataReader('AAPL', 'yahoo', '2010-04-01')
In [73]: df
Out[73]:
Open High Low Close Volume Adj Close
Date
2010-04-01 237.410000 238.730003 232.750000 235.969994 150786300 30.572166
2010-04-05 234.980011 238.509998 234.769993 238.489998 171126900 30.898657
2010-04-06 238.200005 240.239998 237.000004 239.540009 111754300 31.034696
2010-04-07 239.549995 241.920010 238.659988 240.600006 157125500 31.172029
2010-04-08 240.440002 241.540001 238.040001 239.950005 143247300 31.087815
2010-04-09 241.430012 241.889996 240.460003 241.789993 83545700 31.326203
2010-04-12 242.199989 243.069996 241.809994 242.290005 83256600 31.390984
2010-04-13 241.860008 242.800003 241.110004 242.430008 76552700 31.409123
2010-04-14 245.280006 245.810005 244.069992 245.690002 101019100 31.831486
2010-04-15 245.779991 249.029999 245.509998 248.920010 94196200 32.249965
... ... ... ... ... ... ...
2017-04-13 141.910004 142.380005 141.050003 141.050003 17652900 141.050003
2017-04-17 141.479996 141.880005 140.869995 141.830002 16424000 141.830002
2017-04-18 141.410004 142.039993 141.110001 141.199997 14660800 141.199997
2017-04-19 141.880005 142.000000 140.449997 140.679993 17271300 140.679993
2017-04-20 141.220001 142.919998 141.160004 142.440002 23251100 142.440002
2017-04-21 142.440002 142.679993 141.850006 142.270004 17245200 142.270004
2017-04-24 143.500000 143.949997 143.179993 143.639999 17099200 143.639999
2017-04-25 143.910004 144.899994 143.869995 144.529999 18290300 144.529999
2017-04-26 144.470001 144.600006 143.380005 143.679993 19769400 143.679993
2017-04-27 143.919998 144.160004 143.309998 143.789993 14106100 143.789993
[1781 rows x 6 columns]
In [74]: df.last('5M')
Out[74]:
Open High Low Close Volume Adj Close
Date
2016-12-01 110.370003 110.940002 109.029999 109.489998 37086900 109.017344
2016-12-02 109.169998 110.089996 108.849998 109.900002 26528000 109.425578
2016-12-05 110.000000 110.029999 108.250000 109.110001 34324500 108.638987
2016-12-06 109.500000 110.360001 109.190002 109.949997 26195500 109.475358
2016-12-07 109.260002 111.190002 109.160004 111.029999 29998700 110.550697
2016-12-08 110.860001 112.430000 110.599998 112.120003 27068300 111.635996
2016-12-09 112.309998 114.699997 112.309998 113.949997 34402600 113.458090
2016-12-12 113.290001 115.000000 112.489998 113.300003 26374400 112.810902
2016-12-13 113.839996 115.919998 113.750000 115.190002 43733800 114.692743
2016-12-14 115.040001 116.199997 114.980003 115.190002 34031800 114.692743
... ... ... ... ... ... ...
2017-04-13 141.910004 142.380005 141.050003 141.050003 17652900 141.050003
2017-04-17 141.479996 141.880005 140.869995 141.830002 16424000 141.830002
2017-04-18 141.410004 142.039993 141.110001 141.199997 14660800 141.199997
2017-04-19 141.880005 142.000000 140.449997 140.679993 17271300 140.679993
2017-04-20 141.220001 142.919998 141.160004 142.440002 23251100 142.440002
2017-04-21 142.440002 142.679993 141.850006 142.270004 17245200 142.270004
2017-04-24 143.500000 143.949997 143.179993 143.639999 17099200 143.639999
2017-04-25 143.910004 144.899994 143.869995 144.529999 18290300 144.529999
2017-04-26 144.470001 144.600006 143.380005 143.679993 19769400 143.679993
2017-04-27 143.919998 144.160004 143.309998 143.789993 14106100 143.789993
[101 rows x 6 columns]
答案 1 :(得分:2)
解决了它。问题是KDB返回的数据按DESC顺序排序,这使方法last()
混乱。
解决方案是在查询中添加一个排序子句(在Q语言中,它带有backtick followed by the keyword xasc)
df = conn.sync("`Date xasc select Date, Open, High, Low, Close from stocktable", pandas=True) \
.last("5M")
或者,要对Pandas数据帧本身中的数据进行排序。
df_sorted = stocktable.dataframe() \
.sort_values(by="Date",ascending=True) \
.set_index("Date")
.last("5M")
答案 2 :(得分:0)
对我来说它很好用:
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
print (df)
Date a
0 2017-04-03 0
1 2017-04-23 1
2 2017-05-13 2
3 2017-06-02 3
4 2017-06-22 4
5 2017-07-12 5
6 2017-08-01 6
7 2017-08-21 7
8 2017-09-10 8
9 2017-09-30 9
df = df.set_index('Date').last('5M')
print (df)
a
Date
2017-05-13 2
2017-06-02 3
2017-06-22 4
2017-07-12 5
2017-08-01 6
2017-08-21 7
2017-09-10 8
2017-09-30 9
它也适用于重复项,只需要排序DateTime
列:
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
df = pd.concat([df,df], ignore_index=True).sort_values('Date')
print (df)
Date a
0 2017-04-03 0
10 2017-04-03 0
1 2017-04-23 1
11 2017-04-23 1
2 2017-05-13 2
12 2017-05-13 2
3 2017-06-02 3
13 2017-06-02 3
4 2017-06-22 4
14 2017-06-22 4
5 2017-07-12 5
15 2017-07-12 5
6 2017-08-01 6
16 2017-08-01 6
17 2017-08-21 7
7 2017-08-21 7
18 2017-09-10 8
8 2017-09-10 8
9 2017-09-30 9
19 2017-09-30 9
df = df.set_index('Date').last('5M')
print (df)
a
Date
2017-05-13 2
2017-05-13 2
2017-06-02 3
2017-06-02 3
2017-06-22 4
2017-06-22 4
2017-07-12 5
2017-07-12 5
2017-08-01 6
2017-08-01 6
2017-08-21 7
2017-08-21 7
2017-09-10 8
2017-09-10 8
2017-09-30 9
2017-09-30 9
rng = pd.date_range('2017-04-03', periods=10, freq='20D')
df = pd.DataFrame({'Date': rng, 'a': range(10)})
df = pd.concat([df,df], ignore_index=True)
print (df)
Date a
0 2017-04-03 0
1 2017-04-23 1
2 2017-05-13 2
3 2017-06-02 3
4 2017-06-22 4
5 2017-07-12 5
6 2017-08-01 6
7 2017-08-21 7
8 2017-09-10 8
9 2017-09-30 9
10 2017-04-03 0
11 2017-04-23 1
12 2017-05-13 2
13 2017-06-02 3
14 2017-06-22 4
15 2017-07-12 5
16 2017-08-01 6
17 2017-08-21 7
18 2017-09-10 8
19 2017-09-30 9
df = df.set_index('Date').last('5M')
print (df)
a
Date
2017-05-13 2
2017-06-02 3
2017-06-22 4
2017-07-12 5
2017-08-01 6
2017-08-21 7
2017-09-10 8
2017-09-30 9