熊猫数据框

时间:2020-08-01 13:53:46

标签: python pandas dataframe

我有一个表格形式的Pandas DataFrame。我正在尝试将其格式化为附件格式,但失败了。我已经尝试过for循环的版本,但是没有任何结果。不胜感激。

P.S:日期是随机生成的,但确实显示了我要实现的目标的主旨

This is what the DataFrame looks like

This is what I want it to look like

2 个答案:

答案 0 :(得分:0)

我对代码进行了粗略的描述。如果数据更改,则需要进行一些编辑,但是请检查是否有帮助。

import pandas as pd

dfsrc = pd.DataFrame(columns=['customer', 'date'], index=None)
customer_list = ['a', 'b', 'c', 'a', 'c', 'd', 'a', 'b', 'c', 'd']
date_list = ['10/02/2020', '27/01/2020', '27/04/2020', '26/03/2020', '21/02/2020', '07/06/2020', '12/04/2020', '29/05/2020', '10/05/2020', '08/06/2020']
dfsrc.customer = customer_list
dfsrc.date = date_list

results = []
purchases = ['first_purchase', 'second_purchase', 'third_purchase']

for cust in dfsrc.customer.unique():
    rows = dfsrc[dfsrc.customer == cust].reset_index()
    cust_dict = {}
    cust_dict['customer'] = cust
    for idx, row in rows.iterrows():
        cust_dict[purchases[idx]] = row['date']
    results.append(cust_dict)

dfdest = pd.DataFrame(data=results, columns=['customer', 'first_purchase', 'second_purchase', 'third_purchase'])
dfdest

答案 1 :(得分:0)

我认为最好使用单个列,而不是为每个新购买都创建一个新列,并为同一个人的后续购买提供增量值。

使用伪数据

import pandas as pd
df = pd.DataFrame({'Customer': ['a', 'b', 'c', 'd', 'a', 'b', 'c', 'd']
                      , 'Date_of_Purchase': ['10/02/2020', '27/01/2020', '27/04/2020', '26/03/2020', '21/02/2020',
                                             '07/06/2020', \
                                             '12/04/2020', '29/05/2020']})

首先根据客户名称对列客户进行排序:

df.sort_values(by=['Customer'], inplace=True)

然后,使用以下代码增加购买数量:

df['n_purchase_times'] = df.groupby(['Customer']).cumcount() + 1

给你的

  Customer Date_of_Purchase  n_purchase_times
0        a       10/02/2020                 1
4        a       21/02/2020                 2
1        b       27/01/2020                 1
5        b       07/06/2020                 2
2        c       27/04/2020                 1
6        c       12/04/2020                 2
3        d       26/03/2020                 1
7        d       29/05/2020                 2