汇总时间序列中的首次客户$的金额

时间:2018-09-21 11:24:04

标签: python pandas

我有一个数据框,用于汇总特定月份客户在其帐户中拥有的$金额。如果客户没有钱,则金额仅为0。数据框如下所示:

            A       B       C       D       E        F   
11/30/2015  0       1000    0       0       5000     0   
12/31/2015  2000    1000    0       3000    5000     2000
1/31/2016   2000    0       0       3000    5000     2000
2/29/2016   2000    2000    4000    3000    5000     2000
3/31/2016   2000    2000    4000    0       10000    2000
4/30/2016   0       2000    4000    0       10000    0   
5/31/2016   0       2000    4000    0       10000    0   

当客户首次上线时,他们从特定月份的0变为名义金额(或者从11月的名义金额开始)。因此,当特定客户的第一笔名义金额为“新客户”时。

我想在数据框的末尾添加一列,以总计“新”客户的金额。

我已经能够计算出“新”客户的数量(请参见下面的代码),但是我无法更改代码来求和。

def new_customer(column):
    return column[-1] and not any(column[:-1])
table['new_loans'] = table.iloc[:, len(table.columns)].expanding().apply(new_customer).sum(axis=1).astype(int)

结果数据框应如下所示:

            A       B       C       D       E        F       New_Customers 
11/30/2015  0       1000    0       0       5000     0       6000 
12/31/2015  2000    1000    0       3000    5000     2000    7000
1/31/2016   2000    0       0       3000    5000     2000    0
2/29/2016   2000    2000    4000    3000    5000     2000    4000
3/31/2016   2000    2000    4000    0       10000    2000    0
4/30/2016   0       2000    4000    0       10000    0       0
5/31/2016   0       2000    4000    0       10000    0       0

1 个答案:

答案 0 :(得分:3)

使用:

df['New_Customers'] = df.where(df.ne(0).cumsum().eq(1)).sum(axis=1)
print (df)
               A     B     C     D      E     F  New_Customers
11/30/2015     0  1000     0     0   5000     0         6000.0
12/31/2015  2000  1000     0  3000   5000  2000         7000.0
1/31/2016   2000     0     0  3000   5000  2000            0.0
2/29/2016   2000  2000  4000  3000   5000  2000         4000.0
3/31/2016   2000  2000  4000     0  10000  2000            0.0
4/30/2016      0  2000  4000     0  10000     0            0.0
5/31/2016      0  2000  4000     0  10000     0            0.0

说明

首先将DataFrame.ne!=)与0进行比较:

print (df.ne(0))
                A      B      C      D     E      F
11/30/2015  False   True  False  False  True  False
12/31/2015   True   True  False   True  True   True
1/31/2016    True  False  False   True  True   True
2/29/2016    True   True   True   True  True   True
3/31/2016    True   True   True  False  True   True
4/30/2016   False   True   True  False  True  False
5/31/2016   False   True   True  False  True  False

布尔型掩码的累积总和DataFrame.cumsum

print (df.ne(0).cumsum())
            A  B  C  D  E  F
11/30/2015  0  1  0  0  1  0
12/31/2015  1  2  0  1  2  1
1/31/2016   2  2  0  2  3  2
2/29/2016   3  3  1  3  4  3
3/31/2016   4  4  2  3  5  4
4/30/2016   4  5  3  3  6  4
5/31/2016   4  6  4  3  7  4

1DataFrame.eq (==)进行比较-前1

print (df.ne(0).cumsum().eq(1))
                A      B      C      D      E      F
11/30/2015  False   True  False  False   True  False
12/31/2015   True  False  False   True  False   True
1/31/2016   False  False  False  False  False  False
2/29/2016   False  False   True  False  False  False
3/31/2016   False  False  False  False  False  False
4/30/2016   False  False  False  False  False  False
5/31/2016   False  False  False  False  False  False

DataFrame.where之前将值替换为NaN

print (df.where(df.ne(0).cumsum().eq(1)))
                 A       B       C       D       E       F
11/30/2015     NaN  1000.0     NaN     NaN  5000.0     NaN
12/31/2015  2000.0     NaN     NaN  3000.0     NaN  2000.0
1/31/2016      NaN     NaN     NaN     NaN     NaN     NaN
2/29/2016      NaN     NaN  4000.0     NaN     NaN     NaN
3/31/2016      NaN     NaN     NaN     NaN     NaN     NaN
4/30/2016      NaN     NaN     NaN     NaN     NaN     NaN
5/31/2016      NaN     NaN     NaN     NaN     NaN     NaN

每列最后sum

print (df.where(df.ne(0).cumsum().eq(1)).sum(axis=1))
11/30/2015    6000.0
12/31/2015    7000.0
1/31/2016        0.0
2/29/2016     4000.0
3/31/2016        0.0
4/30/2016        0.0
5/31/2016        0.0
dtype: float64