我有以下数据框:
account_id contract_id date_activated
0 1 AAA 2021-01-05
1 1 ADS 2020-12-12
2 1 ADGD 2021-02-03
3 2 HHA 2021-03-05
4 2 HAKD 2021-03-06
5 3 HADSA 2021-05-01
我想要以下结果:
account_id contract_id date_activated Renewal Order
0 1 ADS 2020-12-12 Original
1 1 AAA 2021-01-05 1st
2 1 ADGD 2021-02-03 2nd
3 2 HHA 2021-03-05 Original
4 2 HAKD 2021-03-06 1st
5 3 HADSA 2021-05-01 Original
我要创建的列是“续订订单”。每个账户可以有多个合约。该条件基于每个帐户 (account_id) 和合约激活的顺序 (date_activated)。第一个合同将标识为“原始”,而后续合同将标识为“第一个”、“第二个”,依此类推。
这是原始数据框的字典:
{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
'contract_id': {0: 'AAA',
1: 'ADS',
2: 'ADGD',
3: 'HHA',
4: 'HAKD',
5: 'HADSA'},
'date_activated': {0: Timestamp('2021-01-05 00:00:00'),
1: Timestamp('2020-12-12 00:00:00'),
2: Timestamp('2021-02-03 00:00:00'),
3: Timestamp('2021-03-05 00:00:00'),
4: Timestamp('2021-03-06 00:00:00'),
5: Timestamp('2021-05-01 00:00:00')}}
这是结果的字典:
{'account_id': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
'contract_id': {0: 'ADS',
1: 'AAA',
2: 'ADGD',
3: 'HHA',
4: 'HAKD',
5: 'HADSA'},
'date_activated': {0: Timestamp('2020-12-12 00:00:00'),
1: Timestamp('2021-01-05 00:00:00'),
2: Timestamp('2021-02-03 00:00:00'),
3: Timestamp('2021-03-05 00:00:00'),
4: Timestamp('2021-03-06 00:00:00'),
5: Timestamp('2021-05-01 00:00:00')},
'Renewal Order': {0: 'Original',
1: '1st',
2: '2nd',
3: 'Original',
4: '1st',
5: 'Original'}}
答案 0 :(得分:1)
尝试 sort_values
以确保合约顺序正确 + groupby cumcount
以获取每个订单号,然后使用 map
或 apply
函数将数字转换为所需的字符串值:
def format_order(n):
if n == 0:
return 'Original'
suffix = ['th', 'st', 'nd', 'rd', 'th'][min(n % 10, 4)]
if 11 <= (n % 100) <= 13:
suffix = 'th'
return str(n) + suffix
df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# apply
df['Renewal Order'] = df.groupby('account_id').cumcount().apply(format_order)
或
df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
# map
df['Renewal Order'] = df.groupby('account_id').cumcount().map(format_order)
account_id contract_id date_activated Renewal Order
0 1 ADS 2020-12-12 Original
1 1 AAA 2021-01-05 1st
2 1 ADGD 2021-02-03 2nd
3 2 HHA 2021-03-05 Original
4 2 HAKD 2021-03-06 1st
5 3 HADSA 2021-05-01 Original
答案 1 :(得分:0)
我们可以先通过分组account_id
找到cumcount,然后使用np.select我们可以提供条件if Renewal Order is 0 then replace it with Original
和以下条件。
我们可以将其缩放到 3rd, 4th and so on
。
如果我们需要任何默认值,我还设置了条件 default=unorignal
。
代码
df = df.sort_values(['account_id', 'date_activated']).reset_index(drop=True)
df['Renewal Order'] = df.groupby('account_id').cumcount()
conditions = [
df['Renewal Order']==0,
df['Renewal Order']==1,
df['Renewal Order']==2
]
choices = ['Original', '1st', '2nd']
df['Renewal Order'] = np.select(conditions, choices, default='unOriginal') ## remove default if not required
df
输出
account_id contract_id date_activated Renewal Order
0 1 ADS 2020-12-12 Original
1 1 AAA 2021-01-05 1st
2 1 ADGD 2021-02-03 2nd
3 2 HHA 2021-03-05 Original
4 2 HAKD 2021-03-06 1st
5 3 HADSA 2021-05-01 Original