根据另一列的日期顺序创建另一列

时间:2021-05-23 12:47:02

标签: python pandas dataframe numpy

我有以下数据框:

 account_id contract_id date_activated
0   1   AAA 2021-01-05
1   1   ADS 2020-12-12
2   1   ADGD    2021-02-03
3   2   HHA 2021-03-05
4   2   HAKD    2021-03-06
5   3   HADSA   2021-05-01

我想要以下结果:

 account_id contract_id date_activated  Renewal Order
0   1   ADS 2020-12-12  Original
1   1   AAA 2021-01-05  1st
2   1   ADGD    2021-02-03  2nd
3   2   HHA 2021-03-05  Original
4   2   HAKD    2021-03-06  1st
5   3   HADSA   2021-05-01  Original

我要创建的列是“续订订单”。每个账户可以有多个合约。该条件基于每个帐户 (account_id) 和合约激活的顺序 (date_activated)。第一个合同将标识为“原始”,而后续合同将标识为“第一个”、“第二个”,依此类推。

这是原始数据框的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'AAA',
  1: 'ADS',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2021-01-05 00:00:00'),
  1: Timestamp('2020-12-12 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')}}

这是结果的字典:

{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
 'contract_id': {0: 'ADS',
  1: 'AAA',
  2: 'ADGD',
  3: 'HHA',
  4: 'HAKD',
  5: 'HADSA'},
 'date_activated': {0: Timestamp('2020-12-12 00:00:00'),
  1: Timestamp('2021-01-05 00:00:00'),
  2: Timestamp('2021-02-03 00:00:00'),
  3: Timestamp('2021-03-05 00:00:00'),
  4: Timestamp('2021-03-06 00:00:00'),
  5: Timestamp('2021-05-01 00:00:00')},
 'Renewal Order': {0: 'Original',
  1: '1st',
  2: '2nd',
  3: 'Original',
  4: '1st',
  5: 'Original'}}

2 个答案:

答案 0 :(得分:1)

尝试 sort_values 以确保合约顺序正确 + groupby cumcount 以获取每个订单号,然后使用 mapapply 函数将数字转换为所需的字符串值:

def format_order(n):
    if n == 0:
        return 'Original'
    suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
    if 11 <= (n % 100) <= 13:
        suffix = 'th'
    return str(n) + suffix


df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# apply
df['Renewal Order'] = df.groupby('account_id').cumcount().apply(format_order)

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# map
df['Renewal Order'] = df.groupby('account_id').cumcount().map(format_order)
   account_id contract_id date_activated Renewal Order
0           1         ADS     2020-12-12      Original
1           1         AAA     2021-01-05           1st
2           1        ADGD     2021-02-03           2nd
3           2         HHA     2021-03-05      Original
4           2        HAKD     2021-03-06           1st
5           3       HADSA     2021-05-01      Original

答案 1 :(得分:0)

我们可以先通过分组account_id找到cumcount,然后使用np.select我们可以提供条件if Renewal Order is 0 then replace it with Original和以下条件。
我们可以将其缩放到 3rd, 4th and so on
如果我们需要任何默认值,我还设置了条件 default=unorignal

代码

df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
df['Renewal Order'] = df.groupby('account_id').cumcount()
conditions = [
    df['Renewal Order']==0,
    df['Renewal Order']==1,
    df['Renewal Order']==2
]
choices = ['Original', '1st', '2nd']
df['Renewal Order'] = np.select(conditions, choices, default='unOriginal') ## remove default if not required
df

输出

account_id      contract_id date_activated  Renewal Order
0   1           ADS         2020-12-12      Original
1   1           AAA         2021-01-05      1st
2   1           ADGD        2021-02-03      2nd
3   2           HHA         2021-03-05      Original
4   2           HAKD        2021-03-06      1st
5   3           HADSA       2021-05-01      Original