当另一列中存在N个NaN时,替换pandas列中的值

时间:2018-06-12 03:52:06

标签: python pandas

我有这个人。 pandas dataframe:

2018-05-25  0.000381  0.264318     land    2018-05-25
2018-05-26  0.000000  0.264447     land    2018-05-26
2018-05-27  0.000000  0.264791     NaN           NaT
2018-05-28  0.000000  0.265253     NaN           NaT
2018-05-29  0.000000  0.265720     NaN           NaT
2018-05-30  0.000000  0.266066     land    2018-05-30
2018-05-31  0.000000  0.266150     NaN           NaT
2018-06-01  0.000000  0.265816     NaN           NaT
2018-06-02  0.000000  0.264892     land    2018-06-02
2018-06-03  0.000000  0.263191     NaN           NaT
2018-06-04  0.000000  0.260508     land    2018-06-04
2018-06-05  0.000000  0.256619     NaN           NaT
2018-06-06  0.000000  0.251286     NaN           NaT
2018-06-07  0.000000  0.244250     NaN           NaT
2018-06-08  0.000000  0.235231     NaN           NaT
2018-06-09  0.000000  0.223932     land    2018-06-09

当第4列中有3个或更多连续的NaN值时,我想用NaN替换第3列中的值。输出应如下所示:

2018-05-25  0.000381  0.264318     land    2018-05-25
2018-05-26  0.000000  0.264447     land    2018-05-26
2018-05-27  0.000000  0.264791     NaN           NaT
2018-05-28  0.000000  0.265253     NaN           NaT
2018-05-29  0.000000  NaN          NaN           NaT
2018-05-30  0.000000  0.266066     land    2018-05-30
2018-05-31  0.000000  0.266150     NaN           NaT
2018-06-01  0.000000  0.265816     NaN           NaT
2018-06-02  0.000000  0.264892     land    2018-06-02
2018-06-03  0.000000  0.263191     NaN           NaT
2018-06-04  0.000000  0.260508     land    2018-06-04
2018-06-05  0.000000  0.256619     NaN           NaT
2018-06-06  0.000000  0.251286     NaN           NaT
2018-06-07  0.000000  NaN          NaN           NaT
2018-06-08  0.000000  NaN          NaN           NaT
2018-06-09  0.000000  0.223932     land    2018-06-09

我也很好,如果不是用NaN替换,那么行就完全删除了

2 个答案:

答案 0 :(得分:3)

这是一种方法,其中null的连续出现是n,即

n = 3
# create a mask
x = df[3].isnull()
# counter to restart the count of nan once there is a no nan consecutively 
se = (x.cumsum() - x.cumsum().where(~x).fillna(method='pad').fillna(0))


df.loc[se>=n,2] = np.nan

       0         1         2     3           4
0   2018-05-25  0.000381  0.264318  land  2018-05-25
1   2018-05-26  0.000000  0.264447  land  2018-05-26
2   2018-05-27  0.000000  0.264791   NaN         NaT
3   2018-05-28  0.000000  0.265253   NaN         NaT
4   2018-05-29  0.000000       NaN   NaN         NaT
5   2018-05-30  0.000000  0.266066  land  2018-05-30
6   2018-05-31  0.000000  0.266150   NaN         NaT
7   2018-06-01  0.000000  0.265816   NaN         NaT
8   2018-06-02  0.000000  0.264892  land  2018-06-02
9   2018-06-03  0.000000  0.263191   NaN         NaT
10  2018-06-04  0.000000  0.260508  land  2018-06-04
11  2018-06-05  0.000000  0.256619   NaN         NaT
12  2018-06-06  0.000000  0.251286   NaN         NaT
13  2018-06-07  0.000000       NaN   NaN         NaT
14  2018-06-08  0.000000       NaN   NaN         NaT
15  2018-06-09  0.000000  0.223932  land  2018-06-09

答案 1 :(得分:2)

修改,针对连续NaN的任何阈值提供更多功能的方法:

threshold = 3
mask = df.d.notna()
df.loc[(~mask).groupby(mask.cumsum()).transform('cumsum') >= threshold, 'c'] = np.nan

您可以简单地检查行以及将行移动两次都为空(我将列命名为a-e

df.loc[df.d.isnull() & df.d.shift().isnull() & df.d.shift(2).isnull(), 'c'] = np.nan

# Result:

             a         b         c     d           e
0   2018-05-25  0.000381  0.264318  land  2018-05-25
1   2018-05-26  0.000000  0.264447  land  2018-05-26
2   2018-05-27  0.000000  0.264791   NaN         NaT
3   2018-05-28  0.000000  0.265253   NaN         NaT
4   2018-05-29  0.000000       NaN   NaN         NaT
5   2018-05-30  0.000000  0.266066  land  2018-05-30
6   2018-05-31  0.000000  0.266150   NaN         NaT
7   2018-06-01  0.000000  0.265816   NaN         NaT
8   2018-06-02  0.000000  0.264892  land  2018-06-02
9   2018-06-03  0.000000  0.263191   NaN         NaT
10  2018-06-04  0.000000  0.260508  land  2018-06-04
11  2018-06-05  0.000000  0.256619   NaN         NaT
12  2018-06-06  0.000000  0.251286   NaN         NaT
13  2018-06-07  0.000000       NaN   NaN         NaT
14  2018-06-08  0.000000       NaN   NaN         NaT
15  2018-06-09  0.000000  0.223932  land  2018-06-09