我有一个数据框df
,列A
中有浮点值。我想添加另一列B
,例如:
B[0] = A[0]
i > 0
...
B[i] = if(np.isnan(A[i])) then A[i] else Step3
B[i] = if(abs((B[i-1] - A[i]) / B[i-1]) < 0.3) then B[i-1] else A[i]
可以如下所示生成示例数据帧df
import numpy as np
import pandas as pd
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=list('A'))
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
df.loc[241, 'A'] = np.nan
答案 0 :(得分:2)
使用Numba可以相当有效地完成此操作。如果您无法使用Numba,只需省略@njit
,您的逻辑将作为Python级循环运行。
import numpy as np
import pandas as pd
from numba import njit
np.random.seed(0)
df = pd.DataFrame(1000*(2+np.random.randn(500, 1)), columns=['A'])
df.loc[1, 'A'] = np.nan
df.loc[15, 'A'] = np.nan
df.loc[240, 'A'] = np.nan
@njit
def recurse_nb(x):
out = x.copy()
for i in range(1, x.shape[0]):
if not np.isnan(x[i]) and (abs(1 - x[i] / out[i-1]) < 0.3):
out[i] = out[i-1]
return out
df['B'] = recurse_nb(df['A'].values)
print(df.head(10))
A B
0 3764.052346 3764.052346
1 NaN NaN
2 2978.737984 2978.737984
3 4240.893199 4240.893199
4 3867.557990 4240.893199
5 1022.722120 1022.722120
6 2950.088418 2950.088418
7 1848.642792 1848.642792
8 1896.781148 1848.642792
9 2410.598502 2410.598502
答案 1 :(得分:2)
不确定第一个B-1
和除以NaN
的情况要做什么:
df = pd.DataFrame([1,2,3,4,5,None,6,7,8,9,10], columns=['A'])
b1 = df.A.shift(1)
b1[0] = 1
b = list(map(lambda a,b1: a if np.isnan(a) else (b1 if abs(b1-a)/b1 < 0.3 else a), df.A, b1 ))
df['B'] = b
df
A B
0 1.0 1.0
1 2.0 2.0
2 3.0 3.0
3 4.0 4.0
4 5.0 4.0
5 NaN NaN
6 6.0 6.0
7 7.0 6.0
8 8.0 7.0
9 9.0 8.0
10 10.0 9.0
按照@jpp,您还可以为列表b
做一个列表理解版本:
b = [a if np.isnan(a) or abs(b-a)/b >= 0.3 else b for a,b in zip(df.A,b1)]
答案 2 :(得分:1)
下面是一个我可以想到的简单解决方案。我想知道是否还有更多的Python方式:
a = df['A'].values
b = []
b.append(t[0])
for i in range(1, len(a)):
if np.isnan(a[i]):
b.append(a[i])
else:
b.append(b[i-1] if abs(1 - a[i]/b[i-1]) < 0.3 else a[i])
df['B'] = b
答案 3 :(得分:0)
因此,对于现实世界的数据而言,这可能更快,但也存在真正的最坏情况(如果行0 >>其余数据,则while循环将迭代N次)。
df['B'] = df['A']
to_be_fixed = pd.Series(True, index=df.index)
while to_be_fixed.any():
# Shift column B and the rows that need to be logically tested
diff = df['B'].shift(1)
to_be_fixed = to_be_fixed.shift(1)
# Test the rows to see which need to be replaced
to_be_fixed = to_be_fixed & (np.abs(1 - df['A'] / diff) < 0.3)
# Replace data
df.loc[to_be_fixed, 'B'] = diff.loc[to_be_fixed]
# Fix np.nan that has been introduced into column B
b_na = pd.isnull(df['B'])
df.loc[b_na, 'B'] = df.loc[b_na, 'A']