我有这样的DateFrame:
period payor variance_charges
6/1/2018 LIABILITY PLANS 4631.6667
7/1/2018 LIABILITY PLANS -1125.8333
8/1/2018 LIABILITY PLANS -12688.3333
9/1/2018 LIABILITY PLANS -1657.5
10/1/2018 LIABILITY PLANS -14806.6667
11/1/2018 LIABILITY PLANS 13910.8333
12/1/2018 LIABILITY PLANS 12154.1667
6/1/2018 MEDICAID CMO -39174.5817
7/1/2018 MEDICAID CMO 59504.5767
8/1/2018 MEDICAID CMO 13967.4883
9/1/2018 MEDICAID CMO -158103.49
10/1/2018 MEDICAID CMO -71191.9667
11/1/2018 MEDICAID CMO -405366.1217
12/1/2018 MEDICAID CMO -21637.05
我要检查在付款人(列)上汇总后每个窗口(每个窗口3行)上有多少个负值:
period payor variance_charges count_neg
6/1/2018 LIABILITY PLANS 4631.6667 0
7/1/2018 LIABILITY PLANS -1125.8333 1
8/1/2018 LIABILITY PLANS -12688.3333 2
9/1/2018 LIABILITY PLANS -1657.5 3
10/1/2018 LIABILITY PLANS -14806.6667 3
11/1/2018 LIABILITY PLANS 13910.8333 2
12/1/2018 LIABILITY PLANS 12154.1667 1
6/1/2018 MEDICAID CMO -39174.5817 1
7/1/2018 MEDICAID CMO 59504.5767 1
8/1/2018 MEDICAID CMO 13967.4883 1
9/1/2018 MEDICAID CMO -158103.49 1
10/1/2018 MEDICAID CMO -71191.9667 2
11/1/2018 MEDICAID CMO -405366.12 3
12/1/2018 MEDICAID CMO -21637.05 3
我尝试使用以下代码
df.sort_values(by = 'period', ascending=True)
df['count_neg'] = df.groupby(['payor'])['variance_charges'].transform(lambda x: x.rolling(6, min_periods=1).apply(lambda n: sum(n < 0 for n in x), raw = False))
使用上面的代码,我可以检查整个聚合的负值数量,而无需考虑窗口。我得到的错误结果如下所示:
period payor variance_charges count_neg
6/1/2018 LIABILITY PLANS 4631.6667 4
7/1/2018 LIABILITY PLANS -1125.8333 4
8/1/2018 LIABILITY PLANS -12688.3333 4
9/1/2018 LIABILITY PLANS -1657.5 4
10/1/2018 LIABILITY PLANS -14806.6667 4
11/1/2018 LIABILITY PLANS 13910.8333 4
12/1/2018 LIABILITY PLANS 12154.1667 4
6/1/2018 MEDICAID CMO -39174.5817 5
7/1/2018 MEDICAID CMO 59504.5767 5
8/1/2018 MEDICAID CMO 13967.4883 5
9/1/2018 MEDICAID CMO -158103.49 5
10/1/2018 MEDICAID CMO -71191.9667 5
11/1/2018 MEDICAID CMO -405366.17 5
12/1/2018 MEDICAID CMO -21637.05 5
请帮助解决这个问题。
答案 0 :(得分:1)
您可以通过删除for n in x
来简化功能:
f = lambda x: x.rolling(3, min_periods=1).apply(lambda n: sum(n < 0), raw = False)
df['count_neg1'] = df.groupby(['payor'])['variance_charges'].transform(f).astype(int)
print (df)
period payor variance_charges count_neg count_neg1
0 6/1/2018 LIABILITY PLANS 4631.6667 0 0
1 7/1/2018 LIABILITY PLANS -1125.8333 1 1
2 8/1/2018 LIABILITY PLANS -12688.3333 2 2
3 9/1/2018 LIABILITY PLANS -1657.5000 3 3
4 10/1/2018 LIABILITY PLANS -14806.6667 3 3
5 11/1/2018 LIABILITY PLANS 13910.8333 2 2
6 12/1/2018 LIABILITY PLANS 12154.1667 1 1
7 6/1/2018 MEDICAID CMO -39174.5817 1 1
8 7/1/2018 MEDICAID CMO 59504.5767 1 1
9 8/1/2018 MEDICAID CMO 13967.4883 1 1
10 9/1/2018 MEDICAID CMO -158103.4900 1 1
11 10/1/2018 MEDICAID CMO -71191.9667 2 2
12 11/1/2018 MEDICAID CMO -405366.1200 3 3
13 12/1/2018 MEDICAID CMO -21637.0500 3 3