循环以乘以熊猫中的先前值

时间:2017-03-20 21:00:18

标签: pandas

我在DataFrame中有一个pandas,如下所示:

df = pd.DataFrame({'origin_dte':['2009-08-01','2009-08-01','2009-08-01','2009-08-01','2009-09-01','2009-09-01','2009-09-01'],
                   'date':['2009-08-01','2009-08-02','2009-08-03','2009-08-04','2009-09-01','2009-09-02','2009-09-03'],
                   'bal_pred':[10.,11.,12.,13.,21.,22.,23.],
                   'dbal_pred':[np.nan,.25,.3,.5,np.nan,.4,.45]})

    bal_pred   date   dbal_pred origin_dte
0   10      2009-08-01  NaN     2009-08-01
1   11      2009-08-02  0.25    2009-08-01
2   12      2009-08-03  0.30    2009-08-01
3   13      2009-08-04  0.50    2009-08-01
4   21      2009-09-01  NaN     2009-09-01
5   22      2009-09-02  0.40    2009-09-01
6   23      2009-09-03  0.45    2009-09-01

我想循环遍历bal_pred dbal_pred != NaN dbal_pred[i] * bal_pred[i-1]bal_pred的每个观察结果。例如,0.25*10=2.5的第二个值将变为origin_dte。当dbal_pred发生变化时,意味着NaN再次为NaN,计算将跳过bal_pred观察并计算下一个df。所以 bal_pred date dbal_pred origin_dte 0 10.000 2009-08-01 NaN 2009-08-01 1 2.500 2009-08-02 0.25 2009-08-01 2 0.750 2009-08-03 0.30 2009-08-01 3 0.375 2009-08-04 0.50 2009-08-01 4 21.000 2009-09-01 NaN 2009-09-01 5 8.400 2009-09-02 0.40 2009-09-01 6 3.780 2009-09-03 0.45 2009-09-01 看起来如下所示。我有一个while循环来执行此操作,但问题是循环大数据帧需要很长时间。非常感谢更简单/更快捷的方式!

{{1}}

2 个答案:

答案 0 :(得分:3)

另一种方法是标记每组数据,然后采用每组的累积产品

group = df['dbal_pred'].isnull().cumsum() 
df.dbal_pred.fillna(df.bal_pred, inplace=True)
df['bal_pred'] = df.groupby(group)['dbal_pred'].cumprod()

输出

   bal_pred        date  dbal_pred  origin_dte
0    10.000  2009-08-01        NaN  2009-08-01
1     2.500  2009-08-02       0.25  2009-08-01
2     0.750  2009-08-03       0.30  2009-08-01
3     0.375  2009-08-04       0.50  2009-08-01
4    21.000  2009-09-01        NaN  2009-09-01
5     8.400  2009-09-02       0.40  2009-09-01
6     3.780  2009-09-03       0.45  2009-09-01

答案 1 :(得分:2)

# fillna with 1 so we can cumprod
c = df.dbal_pred.fillna(1).cumprod()

# track where null
n = df.dbal_pred.isnull()

# take cumprod where null and forward fill
d = c.where(n).ffill()

# cumprods divided by cumprod where last null
# gets us a grouped cumprod that starts over
# at every null.
# multiply this by `bal_pred` where null forward filled
# and voila
df.assign(bal_pred=c.div(d) * df.bal_pred.where(n).ffill())

   bal_pred        date  dbal_pred  origin_dte
0    10.000  2009-08-01        NaN  2009-08-01
1     2.500  2009-08-02       0.25  2009-08-01
2     0.750  2009-08-03       0.30  2009-08-01
3     0.375  2009-08-04       0.50  2009-08-01
4    21.000  2009-09-01        NaN  2009-09-01
5     8.400  2009-09-02       0.40  2009-09-01
6     3.780  2009-09-03       0.45  2009-09-01