这是我的数据框
+--------+-------------+----------+---------------+------------+-------------+-----------+
| | Customer ID | Quantity | Invoice Value | Date | InvoiceDate | UnitPrice |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 0 | 500249347 | 0.0 | 0.000 | 2018-01-02 | 2018-01-02 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 1 | 500006647 | 1.0 | 33.715 | 2018-01-02 | 2018-01-02 | 33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 2 | 500407469 | 1.0 | 33.715 | 2018-01-02 | 2018-01-02 | 33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 3 | 500642846 | 0.0 | 0.000 | 2018-01-02 | 2018-01-02 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 4 | 500005450 | 1.0 | 33.715 | 2018-01-02 | 2018-01-02 | 33.715 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| ... | ... | ... | ... | ... | ... | ... |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429545 | 500717072 | 1.0 | 45.620 | 2019-03-31 | 2019-03-31 | 45.620 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429546 | 500105174 | 0.0 | 0.000 | 2019-03-31 | 2019-03-31 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429547 | 500069720 | 0.0 | 0.000 | 2019-03-31 | 2019-03-31 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429548 | 500105528 | 0.0 | 0.000 | 2019-03-31 | 2019-03-31 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
| 429549 | 500732322 | 0.0 | 0.000 | 2019-03-31 | 2019-03-31 | 0.000 |
+--------+-------------+----------+---------------+------------+-------------+-----------+
我想提取功能(新列),例如每位客户自上次访问以来的天(每行的wrt ..快照日期),上次计费金额,上次非零计费金额,数量和自上次购买以来的天数等信息,可以使用一些自定义的累积聚合函数来完成,或者是否可以使用更简单的方法来实现?
答案 0 :(得分:0)
我建议这样:
import pandas as pd
df = pd.DataFrame({'customer_id': [13, 16, 13, 13, 16, 16, 13],
'Date': ['2018-01-02', '2019-03-31', '2019-03-31', '2018-01-02', '2018-01-02', '2019-04-31',
'2018-01-02'],
'Invoice_value': [920, 920, 920, 920, 921, 921, 921],
'Unit_price': [1, 2, 3, 4, 6, 7, 8]})
append_data = [df[(df['customer_id'] == ac)].sort_values(by=['Date']).iloc[-1] for ac in df.customer_id.unique()]
答案 1 :(得分:0)
自上次访问以来,我一直想这样的事情:
df['last_visited']=df.groupby('Customer ID')['Date'].diff()