在Pandas / Python中向后插入多个连续的nan?

时间:2016-08-24 11:33:21

标签: python pandas numpy dataframe interpolation

我有一个在不同地方缺少值的数组。

accept

对于每个import numpy as np import pandas as pd x = np.arange(1,10).astype(float) x[[0,1,6]] = np.nan df = pd.Series(x) print(df) 0 NaN 1 NaN 2 3.0 3 4.0 4 5.0 5 6.0 6 NaN 7 8.0 8 9.0 dtype: float64 ,我想取值继续它,将它除以2。然后将其传播到下一个连续的NaN,所以我最终得到:

NaN

我已尝试过0 0.75 1 1.5 2 3.0 3 4.0 4 5.0 5 6.0 6 4.0 7 8.0 8 9.0 dtype: float64 ,但这似乎与连续的NaN无关。

2 个答案:

答案 0 :(得分:3)

fillna方法ffill的另一个解决方案,与ffill()功能相同:

#back order of Series
b = df[::-1].isnull()
#find all consecutives NaN, count them, divide by 2 and replace 0 to 1
a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})

print(a)
8    1
7    1
6    2
5    1
4    1
3    1
2    1
1    2
0    4
dtype: int32

print(df.bfill().div(a))
0    0.75
1    1.50
2    3.00
3    4.00
4    5.00
5    6.00
6    4.00
7    8.00
8    9.00
dtype: float64

计时len(df)=9k):

In [315]: %timeit (mat(df))
100 loops, best of 3: 11.3 ms per loop

In [316]: %timeit (jez(df1))
100 loops, best of 3: 2.52 ms per loop

时间安排的代码

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)
print(df)
df = pd.concat([df]*1000).reset_index(drop=True)
df1 = df.copy()

def jez(df):
    b = df[::-1].isnull()
    a = (b.cumsum() - b.cumsum().where(~b).ffill()).mul(2).replace({0:1})
    return (df.bfill().div(a))

def mat(df):
    prev = 0
    new_list = []
    for i in df.values[::-1]:
        if np.isnan(i):
            new_list.append(prev/2.)    
            prev = prev / 2.
        else:
            new_list.append(i)
            prev = i
    return pd.Series(new_list[::-1])

print (mat(df))
print (jez(df1))

答案 1 :(得分:2)

您可以这样做:

import numpy as np
import pandas as pd
x = np.arange(1,10).astype(float)
x[[0,1,6]] = np.nan
df = pd.Series(x)

prev = 0
new_list = []
for i in df.values[::-1]:
    if np.isnan(i):
        new_list.append(prev/2.)    
        prev = prev / 2.
    else:
        new_list.append(i)
        prev = i
df = pd.Series(new_list[::-1])

它反过来循环df的值。它跟踪以前的值。如果它不是NaN,它会添加实际值,否则为前一个值的一半。

这可能不是最复杂的Pandas解决方案,但您可以很容易地改变行为。