在汇总和滚动特定窗口后如何应用自定义功能(使用Apply方法)

时间:2019-04-23 08:15:25

标签: python pandas aggregate apply rolling-computation

我有这样的DateFrame: period payor variance_charges 6/1/2018 LIABILITY PLANS 4631.6667 7/1/2018 LIABILITY PLANS -1125.8333 8/1/2018 LIABILITY PLANS -12688.3333 9/1/2018 LIABILITY PLANS -1657.5 10/1/2018 LIABILITY PLANS -14806.6667 11/1/2018 LIABILITY PLANS 13910.8333 12/1/2018 LIABILITY PLANS 12154.1667 6/1/2018 MEDICAID CMO -39174.5817 7/1/2018 MEDICAID CMO 59504.5767 8/1/2018 MEDICAID CMO 13967.4883 9/1/2018 MEDICAID CMO -158103.49 10/1/2018 MEDICAID CMO -71191.9667 11/1/2018 MEDICAID CMO -405366.1217 12/1/2018 MEDICAID CMO -21637.05

我要检查在付款人(列)上汇总后每个窗口(每个窗口3行)上有多少个负值:

period      payor     variance_charges  count_neg
6/1/2018    LIABILITY PLANS 4631.6667   0
7/1/2018    LIABILITY PLANS -1125.8333  1
8/1/2018    LIABILITY PLANS -12688.3333 2
9/1/2018    LIABILITY PLANS -1657.5     3
10/1/2018   LIABILITY PLANS -14806.6667 3
11/1/2018   LIABILITY PLANS 13910.8333  2
12/1/2018   LIABILITY PLANS 12154.1667  1
6/1/2018    MEDICAID CMO    -39174.5817 1
7/1/2018    MEDICAID CMO    59504.5767  1
8/1/2018    MEDICAID CMO    13967.4883  1
9/1/2018    MEDICAID CMO    -158103.49  1
10/1/2018   MEDICAID CMO    -71191.9667 2
11/1/2018   MEDICAID CMO    -405366.12  3
12/1/2018   MEDICAID CMO    -21637.05   3

我尝试使用以下代码

df.sort_values(by = 'period', ascending=True)
df['count_neg'] = df.groupby(['payor'])['variance_charges'].transform(lambda x: x.rolling(6, min_periods=1).apply(lambda n: sum(n < 0 for n in x), raw = False))

使用上面的代码,我可以检查整个聚合的负值数量,而无需考虑窗口。我得到的错误结果如下所示:

period      payor    variance_charges   count_neg
6/1/2018    LIABILITY PLANS 4631.6667   4
7/1/2018    LIABILITY PLANS -1125.8333  4
8/1/2018    LIABILITY PLANS -12688.3333 4
9/1/2018    LIABILITY PLANS -1657.5     4
10/1/2018   LIABILITY PLANS -14806.6667 4
11/1/2018   LIABILITY PLANS 13910.8333  4
12/1/2018   LIABILITY PLANS 12154.1667  4
6/1/2018    MEDICAID CMO    -39174.5817 5
7/1/2018    MEDICAID CMO    59504.5767  5
8/1/2018    MEDICAID CMO    13967.4883  5
9/1/2018    MEDICAID CMO    -158103.49  5
10/1/2018   MEDICAID CMO    -71191.9667 5
11/1/2018   MEDICAID CMO    -405366.17  5
12/1/2018   MEDICAID CMO    -21637.05   5

请帮助解决这个问题。

1 个答案:

答案 0 :(得分:1)

您可以通过删除for n in x来简化功能:

f = lambda x: x.rolling(3, min_periods=1).apply(lambda n: sum(n < 0), raw = False)
df['count_neg1'] = df.groupby(['payor'])['variance_charges'].transform(f).astype(int)

print (df)
       period            payor  variance_charges  count_neg  count_neg1
0    6/1/2018  LIABILITY PLANS         4631.6667          0           0
1    7/1/2018  LIABILITY PLANS        -1125.8333          1           1
2    8/1/2018  LIABILITY PLANS       -12688.3333          2           2
3    9/1/2018  LIABILITY PLANS        -1657.5000          3           3
4   10/1/2018  LIABILITY PLANS       -14806.6667          3           3
5   11/1/2018  LIABILITY PLANS        13910.8333          2           2
6   12/1/2018  LIABILITY PLANS        12154.1667          1           1
7    6/1/2018     MEDICAID CMO       -39174.5817          1           1
8    7/1/2018     MEDICAID CMO        59504.5767          1           1
9    8/1/2018     MEDICAID CMO        13967.4883          1           1
10   9/1/2018     MEDICAID CMO      -158103.4900          1           1
11  10/1/2018     MEDICAID CMO       -71191.9667          2           2
12  11/1/2018     MEDICAID CMO      -405366.1200          3           3
13  12/1/2018     MEDICAID CMO       -21637.0500          3           3