引入检查或忽略NaN值

时间:2018-07-31 09:23:56

标签: python pandas

我有一个包含多个cvs文件的文件夹。每个文件的标题格式都相同:

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos

典型的csv如下所示:

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos
2015-07-27,-0.0,0.0,0.0,0.0,0.0
2015-07-28,-0.0,0.0,0.0,0.0,0.0
2015-07-29,-0.6738699251792465,0.0,-0.6738699251792465,-0.0,-0.027000000000000003
2015-07-30,-0.0,-123.88294424426506,-123.88294424426506,-4.961880089696313,-4.961880089696313
2015-07-31,-0.0,1.9275568497366795,1.9275568497366795,0.09627642044988116,0.09627642044988116

但是有些文件中我有NaN值(见下文)

date,total_cost,total_pnl_pre,total_pnl_pos,total_pnl_per_pre,total_pnl_per_pos
2015-07-27,-0.0,0.0,0.0,0.0,0.0
2015-07-28,-0.0,0.0,0.0,0.0,0.0
2015-07-29,NaN,NaN,NaN,0.0,0.0
2015-07-30,NaN,NaN,NaN,0.0,0.0
2015-07-31,NaN,NaN,NaN,0.0,0.0

我有两个用于处理这些文件的脚本hit_ratemax_drawdown

def hit_rate(array_like):
    seq=np.array(array_like)
    seq=seq[np.nonzero(seq)]
    total_num=len(seq)
    if total_num==0: return -float('Inf')
    pos_num=len(seq[seq>0.0])
    neg_sum=total_num-pos_num
    if neg_sum==0: return float('inf')
    return pos_num/neg_sum

def max_drawdown(ser):
    running_max=pd.expanding_max(ser)
    cur_dd=ser-running_max
    return min(0,cur_dd.min())

将csv读入脚本中的变量array_likeser中,当脚本遇到NaN值时,脚本将崩溃。有没有办法在处理csv时将NaN值设置为零或忽略NaN值?

非常感谢

0 个答案:

没有答案