我有一个数据框,用于汇总特定月份客户在其帐户中拥有的$金额。如果客户没有钱,则金额仅为0。数据框如下所示:
A B C D E F
11/30/2015 0 1000 0 0 5000 0
12/31/2015 2000 1000 0 3000 5000 2000
1/31/2016 2000 0 0 3000 5000 2000
2/29/2016 2000 2000 4000 3000 5000 2000
3/31/2016 2000 2000 4000 0 10000 2000
4/30/2016 0 2000 4000 0 10000 0
5/31/2016 0 2000 4000 0 10000 0
当客户首次上线时,他们从特定月份的0变为名义金额(或者从11月的名义金额开始)。因此,当特定客户的第一笔名义金额为“新客户”时。
我想在数据框的末尾添加一列,以总计“新”客户的金额。
我已经能够计算出“新”客户的数量(请参见下面的代码),但是我无法更改代码来求和。
def new_customer(column):
return column[-1] and not any(column[:-1])
table['new_loans'] = table.iloc[:, len(table.columns)].expanding().apply(new_customer).sum(axis=1).astype(int)
结果数据框应如下所示:
A B C D E F New_Customers
11/30/2015 0 1000 0 0 5000 0 6000
12/31/2015 2000 1000 0 3000 5000 2000 7000
1/31/2016 2000 0 0 3000 5000 2000 0
2/29/2016 2000 2000 4000 3000 5000 2000 4000
3/31/2016 2000 2000 4000 0 10000 2000 0
4/30/2016 0 2000 4000 0 10000 0 0
5/31/2016 0 2000 4000 0 10000 0 0
答案 0 :(得分:3)
使用:
df['New_Customers'] = df.where(df.ne(0).cumsum().eq(1)).sum(axis=1)
print (df)
A B C D E F New_Customers
11/30/2015 0 1000 0 0 5000 0 6000.0
12/31/2015 2000 1000 0 3000 5000 2000 7000.0
1/31/2016 2000 0 0 3000 5000 2000 0.0
2/29/2016 2000 2000 4000 3000 5000 2000 4000.0
3/31/2016 2000 2000 4000 0 10000 2000 0.0
4/30/2016 0 2000 4000 0 10000 0 0.0
5/31/2016 0 2000 4000 0 10000 0 0.0
说明:
首先将DataFrame.ne
(!=
)与0
进行比较:
print (df.ne(0))
A B C D E F
11/30/2015 False True False False True False
12/31/2015 True True False True True True
1/31/2016 True False False True True True
2/29/2016 True True True True True True
3/31/2016 True True True False True True
4/30/2016 False True True False True False
5/31/2016 False True True False True False
布尔型掩码的累积总和DataFrame.cumsum
:
print (df.ne(0).cumsum())
A B C D E F
11/30/2015 0 1 0 0 1 0
12/31/2015 1 2 0 1 2 1
1/31/2016 2 2 0 2 3 2
2/29/2016 3 3 1 3 4 3
3/31/2016 4 4 2 3 5 4
4/30/2016 4 5 3 3 6 4
5/31/2016 4 6 4 3 7 4
将1
与DataFrame.eq
(==)
进行比较-前1
:
print (df.ne(0).cumsum().eq(1))
A B C D E F
11/30/2015 False True False False True False
12/31/2015 True False False True False True
1/31/2016 False False False False False False
2/29/2016 False False True False False False
3/31/2016 False False False False False False
4/30/2016 False False False False False False
5/31/2016 False False False False False False
在DataFrame.where
之前将值替换为NaN
:
print (df.where(df.ne(0).cumsum().eq(1)))
A B C D E F
11/30/2015 NaN 1000.0 NaN NaN 5000.0 NaN
12/31/2015 2000.0 NaN NaN 3000.0 NaN 2000.0
1/31/2016 NaN NaN NaN NaN NaN NaN
2/29/2016 NaN NaN 4000.0 NaN NaN NaN
3/31/2016 NaN NaN NaN NaN NaN NaN
4/30/2016 NaN NaN NaN NaN NaN NaN
5/31/2016 NaN NaN NaN NaN NaN NaN
每列最后sum
:
print (df.where(df.ne(0).cumsum().eq(1)).sum(axis=1))
11/30/2015 6000.0
12/31/2015 7000.0
1/31/2016 0.0
2/29/2016 4000.0
3/31/2016 0.0
4/30/2016 0.0
5/31/2016 0.0
dtype: float64