将非数字转换为NaN并有条件地转发Pandas中的填充

时间:2017-07-24 17:14:34

标签: python-3.x pandas

我的目标是首先将以下DataFrame的非数字(例如字符串)的任何值填充到NaN,然后​​仅向前填充那些已经出现过该列的任何有效数字的Nan。换句话说,每列的数字之间不应包含任何NaN。感谢任何答案。

这是问题的例子:

Date       Open       High   Low      Close
1/1/2015   529.795    peter    peter    peter
1/2/2015   527.561    peter  522.665  523.373
1/5/2015   521.827    peter  511.655  512.463
1/6/2015   513.590    peter  499.678  500.585
1/7/2015   505.612  505.855  498.282  499.728
1/8/2015   496.626  502.101  HELLO    501.303
1/9/2015   503.378  503.537  493.435  494.811
1/12/2015  493.585  494.618  486.225  491.201
1/13/2015  497.474  501.603  491.042  494.821
1/14/2015  493.295  501.852  491.65   499.498
1/15/2015  504.186  504.295  496.397  PETER
1/16/2015  498.641  506.798  498.631  506.689
1/19/2015  498.641  506.798  498.631  506.689
1/20/2015  509.601  511.097   504.63  505.512        

我想要的输出如下:

date        open     high      low    close
1/1/2015   529.795    NaN    NaN      NaN
1/2/2015   527.561    NaN    522.665  523.373
1/5/2015   521.827    NaN    511.655  512.463
1/6/2015   513.590    NaN    499.678  500.585
1/7/2015   505.612  505.855  498.282  499.728
1/8/2015   496.626  502.101  498.282  501.303
1/9/2015   503.378  503.537  493.435  494.811
1/12/2015  493.585  494.618  486.225  491.201
1/13/2015  497.474  501.603  491.042  494.821
1/14/2015  493.295  501.852   491.65  499.498
1/15/2015  504.186  504.295  496.397  499.498
1/16/2015  498.641  506.798  498.631  506.689
1/19/2015  498.641  506.798  498.631  506.689
1/20/2015  509.601  511.097   504.63  505.512
1/21/2015  505.861  517.858  504.814  516.621
1/22/2015  520.052  534.861  518.277  532.927
1/23/2015  534.123  540.685   531.54  538.471
1/26/2015  537.055  537.524   528.22  533.744
1/27/2015  528.519  529.247  516.771   517.21
1/28/2015  521.348  521.558  508.603  508.603

1 个答案:

答案 0 :(得分:1)

使用set_indexpd.to_numericerrors='coerce'ffill()

df.set_index('Date').apply(pd.to_numeric,errors='coerce').ffill().reset_index()

@ cmaher的解决方案可能更快:

df.iloc[:,1:] = df.iloc[:,1:].apply(pd.to_numeric, errors='coerce').ffill()

输出:

         Date     Open     High      Low    Close
0    1/1/2015  529.795      NaN      NaN      NaN
1    1/2/2015  527.561      NaN  522.665  523.373
2    1/5/2015  521.827      NaN  511.655  512.463
3    1/6/2015  513.590      NaN  499.678  500.585
4    1/7/2015  505.612  505.855  498.282  499.728
5    1/8/2015  496.626  502.101  498.282  501.303
6    1/9/2015  503.378  503.537  493.435  494.811
7   1/12/2015  493.585  494.618  486.225  491.201
8   1/13/2015  497.474  501.603  491.042  494.821
9   1/14/2015  493.295  501.852  491.650  499.498
10  1/15/2015  504.186  504.295  496.397  499.498
11  1/16/2015  498.641  506.798  498.631  506.689
12  1/19/2015  498.641  506.798  498.631  506.689
13  1/20/2015  509.601  511.097  504.630  505.512