我有一个熊猫数据框,如下所示:
string Name {get;}
我要执行以下操作,
如果先观察组,然后在| date2 |处创建新行/记录= | date1 | -1个月和其他值不见了 ELSE | date2 | = | date1 |
我的预期输出
import pandas as pd
df = pd.DataFrame({'group':['A','A','A', 'B', 'B'],'date1':['12/1/2019','12/1/2019','12/1/2019', '12/1/2022', '12/1/2021'], 'nb_months':[11,11,12, 23, 15], 'col1':[1,1,2, 3, 5]})
df['date1'] = pd.to_datetime(df['date1'], format='%m/%d/%Y', errors='coerce').dropna()
df
group date1 nb_months col1
0 A 2019-12-01 11 1
1 A 2019-12-01 11 1
2 A 2019-12-01 12 2
3 B 2022-12-01 23 3
4 B 2021-12-01 15 5
答案 0 :(得分:1)
使用DataFrame.drop_duplicates
表示重复的行,将列减去1个月后添加,按concat
添加到原始列,最后使用reindex
进行排序以得出原始列顺序:
df1 = (df.drop_duplicates('group')
.assign(date2 = lambda x: x['date1'] - pd.offsets.DateOffset(months=1)))
df = (pd.concat([df1[['group', 'date2']],
df.assign(date2 = lambda x: x['date1'])], sort=False)
.sort_values('group')
.reindex(columns=df.columns.tolist() + ['date2'])
.reset_index(drop=True))
print (df)
group date1 nb_months col1 date2
0 A NaT NaN NaN 2019-11-01
1 A 2019-12-01 11.0 1.0 2019-12-01
2 A 2019-12-01 11.0 1.0 2019-12-01
3 A 2019-12-01 12.0 2.0 2019-12-01
4 B NaT NaN NaN 2022-11-01
5 B 2022-12-01 23.0 3.0 2022-12-01
6 B 2021-12-01 15.0 5.0 2021-12-01