如何修复列操作生成的熊猫中不稳定的NaN值?

时间:2019-06-25 14:58:44

标签: python pandas

我有一些外汇数据here,我正在尝试对它们进行一些熊猫操作。

    import pandas as pd
    import numpy as np

    df = pd.read_excel(r"History_M1.xlsx", sheet_name='Sheet1', dtype={'high': float, 'low':float, 'open':float, 'close':float, 'hour': str})
    df['time'] = pd.to_datetime(df['time'], utc=True)
    df.set_index('time', inplace=True)                        
    df[['high','low','open','close']] = df[['high','low','open','close']].apply(pd.to_numeric, errors='coerce')
    df['hour'] = df.index.hour
    df['hl'] = (df['high'] - df['low'])*10**4
    df['oc'] = (df['close'] - df['open'])*10**4
    df['ab'] = (df['close'] - df['open']).abs()*10**4
    df['dir'] = df[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)

我将df下采样到一个小时的频率,并执行了一些列操作。

    dfh = df[['volume','high','low','open','close']].resample('1H').agg({'volume': 'sum','open': 'first','high': 'max','low': 'min','close': 'last'}).ffill()
    dfh['day'] = dfh.index.weekday
    dfh['hour'] = dfh.index.hour
    dfh['hl'] = (dfh['high'] - dfh['low'])*10**4
    dfh['oc'] = (dfh['close'] - dfh['open'])*10**4
    dfh['ab'] = (dfh['close'] - df['open']).abs()*10**4
    dfh['dir'] = dfh[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)

dfh['ab]无缘无故地提供了一些NaN值。我们该如何解决?

enter image description here

1 个答案:

答案 0 :(得分:1)

也许不起作用,因为您在此处将dfh替换为df

 dfh['ab'] = (dfh['close'] - df['open']).abs()*10**4   # should be dfh['open']

也尝试更改此lambda操作

df['dir'] = df[['close','open']].apply(lambda x: 1 if x['close'] > x['open'] else -1, axis=1)

通过numpy操作(快得多)

df['dir'] = np.where(df['close'] > df['open'], 1, -1)