我正在尝试遍历df中的行,并在某个值为NaN或0时对连续的行进行计数,并开始计数该值是否从NaN或0变为零。我想得到这样的内容:< / p>
Value Period
0 1
0 2
0 3
NaN 4
21 NaN
4 NaN
0 1
0 2
NaN 3
我编写了一个函数,该函数将一个数据帧作为参数,并返回一个带有额外列的数字来表示该计数:
def calc_period(df):
period_x = []
sum_x = 0
for i in range(1,df.shape[0]):
if df.iloc[i,0] == np.nan or df.iloc[i,0] == 0:
sum_x += 1
period_x.append(sum_x)
else:
period_x.append(None)
sum_x = 0
period_x.append(sum_x)
df['period_x'] = period_x
return df
当值为0时,该函数运行良好。但是当值为NaN时,计数也为NaN,我得到以下结果:
Value Period
0 1
0 2
0 3
NaN NaN
NaN NaN
答案 0 :(得分:2)
这是您代码的修订版:
5 -0.025081
Name: percent_change, dtype: float64
有2个修复程序:
import pandas as pd
import numpy as np
import math
def is_nan_or_zero(val):
return math.isnan(val) or val == 0
def calc_period(df):
is_first_nan_or_zero = is_nan_or_zero(df.iloc[0, 0])
period_x = [1 if is_first_nan_or_zero else np.nan]
sum_x = 1 if is_first_nan_or_zero else 0
for i in range(1,df.shape[0]):
val = df.iloc[i,0]
if is_nan_or_zero(val):
sum_x += 1
period_x.append(sum_x)
else:
period_x.append(None)
sum_x = 0
df['period_x'] = period_x
return df
代替df.iloc[i,0] == np.nan
math.isnan(val)
,然后添加第一个期间值(因为我们从第二个值开始迭代)