熊猫多索引数据框中的小时和分钟平均值

时间:2019-12-21 11:13:30

标签: python python-3.x pandas datetime multi-index

我有以下代码:

import pandas as pd
from pandas import datetime
from pandas import DataFrame as df
import matplotlib
from pandas_datareader import data as web
import matplotlib.pyplot as plt
import datetime
import fxcmpy
import numpy as np


print(con.get_instruments())
symbols = con.get_instruments()

ticker = 'NGAS'
start = datetime.datetime(2015,1,1)
end = datetime.datetime.today()
data1= con.get_candles(ticker, period='m1', number=10000)

data.index = pd.to_datetime(data.index, format ='%Y-%m-%d %H:%M %S')
data['hour'] = data.index.hour
data['minute'] = data.index.minute
data.set_index(['hour', 'minute'], inplace=True)

这给了我以下输出:

        bidopen bidclose    bidhigh bidlow  askopen askclose    askhigh asklow  tickqty
hour    minute                                  
10  52  2.2400  2.2395  2.2395  2.2390  2.2475  2.2470  2.2475  2.2470  3
53  2.2395  2.2415  2.2415  2.2395  2.2470  2.2490  2.2490  2.2475  8
54  2.2415  2.2415  2.2415  2.2410  2.2490  2.2490  2.2490  2.2485  4
56  2.2415  2.2415  2.2415  2.2415  2.2490  2.2490  2.2490  2.2490  2
57  2.2415  2.2410  2.2415  2.2400  2.2490  2.2485  2.2490  2.2480  8
... ... ... ... ... ... ... ... ... ... ...
21  39  2.3385  2.3385  2.3395  2.3380  2.3465  2.3460  2.3470  2.3460  10
41  2.3385  2.3375  2.3385  2.3370  2.3460  2.3460  2.3460  2.3460  4
42  2.3375  2.3365  2.3385  2.3360  2.3460  2.3440  2.3460  2.3440  10
43  2.3365  2.3375  2.3385  2.3360  2.3440  2.3450  2.3460  2.3440  15
44  2.3375  2.3365  2.3375  2.3360  2.3450  2.3445  2.3450  2.3440  5
10000 rows × 9 columns

我想做的是,以这样的一种方式获取bidlow的均值,即我在同一表中每小时1分钟的平均出价较低,而1小时bidlow的平均值为21小时的44分钟。我该怎么办?

1 个答案:

答案 0 :(得分:2)

我认为这里最好与功能DataFrame.between_timeDatetimeIndex一起使用:

data = con.get_candles(ticker, period='m1', number=10000)
data1= con.get_candles(ticker, period='m1', number=10000)

#already DatetimeIndex, so not necessary converting
#data.index = pd.to_datetime(data.index, format ='%Y-%m-%d %H:%M %S')
data['hour'] = data.index.hour
data['minute'] = data.index.minute

#print (data)

两次之间的第一个过滤器:

data2 = data.between_time('01:01:00', '21:44:00').copy()
print (data2)
                     bidopen  bidclose  bidhigh  bidlow  askopen  askclose  \
date                                                                         
2019-12-10 10:52:00   2.2400    2.2395   2.2395  2.2390   2.2475    2.2470   
2019-12-10 10:53:00   2.2395    2.2415   2.2415  2.2395   2.2470    2.2490   
2019-12-10 10:54:00   2.2415    2.2415   2.2415  2.2410   2.2490    2.2490   
2019-12-10 10:56:00   2.2415    2.2415   2.2415  2.2415   2.2490    2.2490   
2019-12-10 10:57:00   2.2415    2.2410   2.2415  2.2400   2.2490    2.2485   
                     ...       ...      ...     ...      ...       ...   
2019-12-20 21:39:00   2.3385    2.3385   2.3395  2.3380   2.3465    2.3460   
2019-12-20 21:41:00   2.3385    2.3375   2.3385  2.3370   2.3460    2.3460   
2019-12-20 21:42:00   2.3375    2.3365   2.3385  2.3360   2.3460    2.3440   
2019-12-20 21:43:00   2.3365    2.3375   2.3385  2.3360   2.3440    2.3450   
2019-12-20 21:44:00   2.3375    2.3365   2.3375  2.3360   2.3450    2.3445   

                     askhigh  asklow  tickqty  hour  minute  
date                                                         
2019-12-10 10:52:00   2.2475  2.2470        3    10      52  
2019-12-10 10:53:00   2.2490  2.2475        8    10      53  
2019-12-10 10:54:00   2.2490  2.2485        4    10      54  
2019-12-10 10:56:00   2.2490  2.2490        2    10      56  
2019-12-10 10:57:00   2.2490  2.2480        8    10      57  
                     ...     ...      ...   ...     ...  
2019-12-20 21:39:00   2.3470  2.3460       10    21      39  
2019-12-20 21:41:00   2.3460  2.3460        4    21      41  
2019-12-20 21:42:00   2.3460  2.3440       10    21      42  
2019-12-20 21:43:00   2.3460  2.3440       15    21      43  
2019-12-20 21:44:00   2.3450  2.3440        5    21      44  

然后每小时每小时汇总mean

data3 = data2.groupby(['hour','minute'], as_index=False)['bidlow'].mean()
print (data3)
      hour  minute    bidlow
0        1       1  2.290750
1        1       2  2.316000
2        1       3  2.305071
3        1       4  2.304857
4        1       5  2.302125
   ...     ...       ...
1239    21      40  2.284167
1240    21      41  2.328000
1241    21      42  2.323400
1242    21      43  2.291100
1243    21      44  2.315786

[1244 rows x 3 columns]