我的熊猫数据框中有一个列,如下所示:
Status
1 Past Due
2 Yet to Calc
3 Overdue
4 Past Due
5 Past Due
6 Yet to Calc
7 Past Due
8 Past Due
9 Past Due
10 Yet to Calc
11 Overdue
12 Yet to Calc
13 Past Due
14 Past Due
15 Past Due
16 Yet to Calc
17 Overdue
18 Past Due
19 Past Due
20 Past Due
21 Yet to Calc
我想用“过期”填充“过期”和“尚未计算”之间的所有值。所以我的预期输出是:
Status
1 Past Due
2 Yet to Calc
3 Overdue
4 Overdue
5 Overdue
6 Yet to Calc
7 Past Due
8 Past Due
9 Past Due
10 Yet to Calc
11 Overdue
12 Yet to Calc
13 Past Due
14 Past Due
15 Past Due
16 Yet to Calc
17 Overdue
18 Overdue
19 Overdue
20 Overdue
21 Yet to Calc
我尝试按切片分组,并在组中向前填充,如下所示:
df3['Inventory_1'] = df3.groupby(df3.loc['Overdue':'Yet to Calc','Inventory_1']).ffill()
但是上面的返回空序列,并且不填充。
如果不使用填充,该怎么办?
答案 0 :(得分:4)
这里的总体思路是屏蔽所有不是 punpckldq
或Yet to Calc
的值,并使用Overdue
。但是,这也将ffill
ffill
向前转发,这是从未希望的。由于我们只需要保留Yet to Calc
即可停止填充Yet to Calc
值,因此我们可以将结果中非Overdue
以外的所有内容替换为原始内容中包含的所有内容数据框。
Overdue
+ mask
+ ffill
isin
s = df.Status.mask(~df.Status.isin(['Overdue', 'Yet to Calc'])).ffill()
s[s.ne('Overdue')] = df.Status
print(s)