我有一组数据,其中某些数据集包含需要更正的行'null'
。
有规则要更正pandas
数据框中的数据。
当音量为null
时,请更改为0
。
Open
,High
,Low
,Close
按照前一天的结束。具有2016-6-29
的平均null
将跟随2016-6-28
的结束0.6
如果第一行为空,则会volume = 0
和Open
,high
,Low
Close
,按照第二天的开放值
>>df_a
Date,Stock,Open,High,Low,Close,Adj Close,Volume
2016-06-22,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-23,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-24,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-27,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,800
2016-06-28,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-29,AWG,null,null,null,null,null,null
2016-06-30,AWG,null,null,null,null,null,null
2016-07-01,AWG,0.620000,0.650000,0.620000,0.650000,0.650000,40000
2016-07-04,AWG,null,null,null,null,null,null
2016-07-05,AWG,null,null,null,null,null,null
2016-07-07,AWG,0.625000,0.650000,0.565000,0.650000,0.650000,3000
2016-07-08,AWG,0.650000,0.650000,0.650000,0.650000,0.650000,0
2016-07-11,AWG,0.650000,0.650000,0.605000,0.605000,0.605000,6000
2016-07-12,AWG,0.640000,0.640000,0.640000,0.640000,0.640000,3300
>>df_b
Date,Stock,Open,High,Low,Close,Adj Close,Volume
2016-06-10,WG,null,null,null,null,null,null
2016-06-13,WG,null,null,null,null,null,null
2016-06-14,WG,0.600000,0.600000,0.600000,0.600000,0.600000,1000
2016-06-15,WG,0.600000,0.600000,0.600000,0.600000,0.600000,2000
2016-06-16,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-17,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-20,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-21,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-22,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-23,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-24,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-27,WG,0.600000,0.600000,0.600000,0.600000,0.600000,800
2016-06-28,WG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-29,WG,null,null,null,null,null,null
2016-06-30,WG,null,null,null,null,null,null
2016-07-01,WG,0.620000,0.650000,0.620000,0.650000,0.650000,40000
2016-07-04,WG,null,null,null,null,null,null
2016-07-05,WG,null,null,null,null,null,null
我的部分代码:
volume = df_a['Volume'] == 'null'
df_a.loc[volume,'Volume'] = 0
但是,我无法继续Open
,High
,Low
和Close
。
答案 0 :(得分:0)
第1部分(已经由你回答)
对于矢量化实现,最好将null转换为第一个nan(可能存在更好的解决方案)
第3部(替换第一行)
df.replace('null',np.NaN,inplace=True)
df.iloc[0].fillna(df.iloc[1].Open,inplace=True)
第2部分(用以前的关闭值替换所有空值):
df['Close'].bfill(inplace=True)
df['Low'].fillna(df['Close'].shift(1),inplace=True)
df['Open'].fillna(df['Close'].shift(1),inplace=True)
df['High'].fillna(df['Close'].shift(1),inplace=True)
print(df)