我有一些外汇数据here,我正在尝试对它们进行一些熊猫操作。
import pandas as pd
import numpy as np
df = pd.read_excel(r"History_M1.xlsx", sheet_name='Sheet1', dtype={'high': float, 'low':float, 'open':float, 'close':float, 'hour': str})
df['time'] = pd.to_datetime(df['time'], utc=True)
df.set_index('time', inplace=True)
df[['high','low','open','close']] = df[['high','low','open','close']].apply(pd.to_numeric, errors='coerce')
df['hour'] = df.index.hour
df['hl'] = (df['high'] - df['low'])*10**4
df['oc'] = (df['close'] - df['open'])*10**4
df['ab'] = (df['close'] - df['open']).abs()*10**4
df['dir'] = df[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)
我将df
下采样到一个小时的频率,并执行了一些列操作。
dfh = df[['volume','high','low','open','close']].resample('1H').agg({'volume': 'sum','open': 'first','high': 'max','low': 'min','close': 'last'}).ffill()
dfh['day'] = dfh.index.weekday
dfh['hour'] = dfh.index.hour
dfh['hl'] = (dfh['high'] - dfh['low'])*10**4
dfh['oc'] = (dfh['close'] - dfh['open'])*10**4
dfh['ab'] = (dfh['close'] - df['open']).abs()*10**4
dfh['dir'] = dfh[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)
列dfh['ab]
无缘无故地提供了一些NaN值。我们该如何解决?
答案 0 :(得分:1)
也许不起作用,因为您在此处将dfh
替换为df
dfh['ab'] = (dfh['close'] - df['open']).abs()*10**4 # should be dfh['open']
也尝试更改此lambda
操作
df['dir'] = df[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)
通过numpy
操作(快得多)
df['dir'] = np.where(df['close'] > df['open'], 1, -1)