我有一个在不同地方缺少值的数组。
accept
对于每个import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
0 NaN
1 NaN
2 3.0
3 4.0
4 5.0
5 6.0
6 NaN
7 8.0
8 9.0
dtype: float64
,我想取值继续它,将它除以2。然后将其传播到下一个连续的NaN
,所以我最终得到:
NaN
我已尝试过0 0.75
1 1.5
2 3.0
3 4.0
4 5.0
5 6.0
6 4.0
7 8.0
8 9.0
dtype: float64
,但这似乎与连续的NaN无关。
答案 0 :(得分:3)
fillna
方法ffill
的另一个解决方案,与ffill()
功能相同:
#back order of Series
b = df[::-1].isnull()
#find all consecutives NaN, count them, divide by 2 and replace 0 to 1
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
print(a)
8 1
7 1
6 2
5 1
4 1
3 1
2 1
1 2
0 4
dtype: int32
print(df.bfill().div(a))
0 0.75
1 1.50
2 3.00
3 4.00
4 5.00
5 6.00
6 4.00
7 8.00
8 9.00
dtype: float64
计时(len(df)=9k
):
In [315]: %timeit (mat(df))
100 loops, best of 3: 11.3 ms per loop
In [316]: %timeit (jez(df1))
100 loops, best of 3: 2.52 ms per loop
时间安排的代码:
import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()
def jez(df):
b = df[::-1].isnull()
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
return (df.bfill().div(a))
def mat(df):
prev = 0
new_list = []
for i in df.values[::-1]:
if np.isnan(i):
new_list.append(prev/2.)
prev = prev / 2.
else:
new_list.append(i)
prev = i
return pd.Series(new_list[::-1])
print (mat(df))
print (jez(df1))
答案 1 :(得分:2)
您可以这样做:
import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
prev = 0
new_list = []
for i in df.values[::-1]:
if np.isnan(i):
new_list.append(prev/2.)
prev = prev / 2.
else:
new_list.append(i)
prev = i
df = pd.Series(new_list[::-1])
它反过来循环df的值。它跟踪以前的值。如果它不是NaN,它会添加实际值,否则为前一个值的一半。
这可能不是最复杂的Pandas解决方案,但您可以很容易地改变行为。