Python:如何在pandas中修改null数据

时间:2017-06-03 08:26:52

标签: python pandas dataframe

我有一组数据,其中某些数据集包含需要更正的行'null'

有规则要更正pandas数据框中的数据。

  1. 当音量为null时,请更改为0

  2. OpenHighLowClose按照前一天的结束。具有2016-6-29的平均null将跟随2016-6-28的结束0.6

  3. 如果第一行为空,则会volume = 0OpenhighLow Close,按照第二天的开放值

    >>df_a   
    Date,Stock,Open,High,Low,Close,Adj Close,Volume
    2016-06-22,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-23,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-24,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-27,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,800
    2016-06-28,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-29,AWG,null,null,null,null,null,null
    2016-06-30,AWG,null,null,null,null,null,null
    2016-07-01,AWG,0.620000,0.650000,0.620000,0.650000,0.650000,40000
    2016-07-04,AWG,null,null,null,null,null,null
    2016-07-05,AWG,null,null,null,null,null,null
    2016-07-07,AWG,0.625000,0.650000,0.565000,0.650000,0.650000,3000
    2016-07-08,AWG,0.650000,0.650000,0.650000,0.650000,0.650000,0
    2016-07-11,AWG,0.650000,0.650000,0.605000,0.605000,0.605000,6000
    2016-07-12,AWG,0.640000,0.640000,0.640000,0.640000,0.640000,3300
    
    >>df_b
    Date,Stock,Open,High,Low,Close,Adj Close,Volume
    2016-06-10,WG,null,null,null,null,null,null
    2016-06-13,WG,null,null,null,null,null,null
    2016-06-14,WG,0.600000,0.600000,0.600000,0.600000,0.600000,1000
    2016-06-15,WG,0.600000,0.600000,0.600000,0.600000,0.600000,2000
    2016-06-16,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-17,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-20,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-21,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-22,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-23,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-24,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-27,WG,0.600000,0.600000,0.600000,0.600000,0.600000,800
    2016-06-28,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
    2016-06-29,WG,null,null,null,null,null,null
    2016-06-30,WG,null,null,null,null,null,null
    2016-07-01,WG,0.620000,0.650000,0.620000,0.650000,0.650000,40000
    2016-07-04,WG,null,null,null,null,null,null
    2016-07-05,WG,null,null,null,null,null,null
    
  4. 我的部分代码:

    volume = df_a['Volume'] == 'null'
    df_a.loc[volume,'Volume'] = 0
    

    但是,我无法继续OpenHighLowClose

1 个答案:

答案 0 :(得分:0)

第1部分(已经由你回答)

对于矢量化实现,最好将null转换为第一个nan(可能存在更好的解决方案)

第3部(替换第一行)

  df.replace('null',np.NaN,inplace=True)
  df.iloc[0].fillna(df.iloc[1].Open,inplace=True)

第2部分(用以前的关闭值替换所有空值):

df['Close'].bfill(inplace=True)
df['Low'].fillna(df['Close'].shift(1),inplace=True)
df['Open'].fillna(df['Close'].shift(1),inplace=True)
df['High'].fillna(df['Close'].shift(1),inplace=True)
print(df)