熊猫使用基于条件的值填充列

时间:2018-11-26 23:02:33

标签: python pandas pandas-groupby

我的熊猫数据框中有一个列,如下所示:

   Status
1  Past Due
2  Yet to Calc
3  Overdue
4  Past Due
5  Past Due
6  Yet to Calc
7  Past Due
8  Past Due
9  Past Due
10  Yet to Calc
11  Overdue
12  Yet to Calc
13  Past Due
14  Past Due
15  Past Due
16  Yet to Calc
17  Overdue
18  Past Due
19  Past Due
20  Past Due
21  Yet to Calc

我想用“过期”填充“过期”和“尚未计算”之间的所有值。所以我的预期输出是:

   Status
1  Past Due
2  Yet to Calc
3  Overdue
4  Overdue
5  Overdue
6  Yet to Calc
7  Past Due
8  Past Due
9  Past Due
10 Yet to Calc
11 Overdue
12 Yet to Calc
13 Past Due
14 Past Due
15 Past Due
16 Yet to Calc
17 Overdue
18 Overdue
19 Overdue
20 Overdue
21 Yet to Calc

我尝试按切片分组,并在组中向前填充,如下所示:

df3['Inventory_1'] = df3.groupby(df3.loc['Overdue':'Yet to Calc','Inventory_1']).ffill()

但是上面的返回空序列,并且不填充。

如果不使用填充,该怎么办?

1 个答案:

答案 0 :(得分:4)

这里的总体思路是屏蔽所有不是 punpckldqYet to Calc的值,并使用Overdue。但是,这也将ffill ffill向前转发,这是从未希望的。由于我们只需要保留Yet to Calc即可停止填充Yet to Calc值,因此我们可以将结果中非Overdue以外的所有内容替换为原始内容中包含的所有内容数据框。


Overdue + mask + ffill

isin

s = df.Status.mask(~df.Status.isin(['Overdue', 'Yet to Calc'])).ffill()
s[s.ne('Overdue')] = df.Status

print(s)