我有一个表格形式的Pandas DataFrame。我正在尝试将其格式化为附件格式,但失败了。我已经尝试过for循环的版本,但是没有任何结果。不胜感激。
P.S:日期是随机生成的,但确实显示了我要实现的目标的主旨
答案 0 :(得分:0)
我对代码进行了粗略的描述。如果数据更改,则需要进行一些编辑,但是请检查是否有帮助。
import pandas as pd
dfsrc = pd.DataFrame(columns=['customer', 'date'], index=None)
customer_list = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b', 'c', 'd']
date_list = ['10/02/2020', '27/01/2020', '27/04/2020', '26/03/2020', '21/02/2020', '07/06/2020', '12/04/2020', '29/05/2020', '10/05/2020', '08/06/2020']
dfsrc.customer = customer_list
dfsrc.date = date_list
results = []
purchases = ['first_purchase', 'second_purchase', 'third_purchase']
for cust in dfsrc.customer.unique():
rows = dfsrc[dfsrc.customer == cust].reset_index()
cust_dict = {}
cust_dict['customer'] = cust
for idx, row in rows.iterrows():
cust_dict[purchases[idx]] = row['date']
results.append(cust_dict)
dfdest = pd.DataFrame(data=results, columns=['customer', 'first_purchase', 'second_purchase', 'third_purchase'])
dfdest
答案 1 :(得分:0)
我认为最好使用单个列,而不是为每个新购买都创建一个新列,并为同一个人的后续购买提供增量值。
使用伪数据
import pandas as pd
df = pd.DataFrame({'Customer': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
, 'Date_of_Purchase': ['10/02/2020', '27/01/2020', '27/04/2020', '26/03/2020', '21/02/2020',
'07/06/2020', \
'12/04/2020', '29/05/2020']})
首先根据客户名称对列客户进行排序:
df.sort_values(by=['Customer'], inplace=True)
然后,使用以下代码增加购买数量:
df['n_purchase_times'] = df.groupby(['Customer']).cumcount() + 1
给你的
Customer Date_of_Purchase n_purchase_times
0 a 10/02/2020 1
4 a 21/02/2020 2
1 b 27/01/2020 1
5 b 07/06/2020 2
2 c 27/04/2020 1
6 c 12/04/2020 2
3 d 26/03/2020 1
7 d 29/05/2020 2