我有一个看起来像这样的数据框:
d = {'ID': [0, 1, 2, 3, 4],
'm1': ['2019-12-06', '2019-12-07','2019-12-07', '2019-12-06', '2020-12-09'],
'm2': ['2019-12-07', None, None, '2019-12-07', None],
'm3': [None, None, None, '2019-12-09', None],
'm1_m2': [1, 1, 2, 2, 3],
'm2_m3': [3, 3, 4, 1, 2]}
dat = pd.DataFrame(d)
打印(日期)
ID m1 m2 m3 m1_m2 m2_m3
0 0 2019-12-06 2019-12-07 None 1 3
1 1 2019-12-07 None None 1 3
2 2 2019-12-07 None None 2 4
3 3 2019-12-06 2019-12-07 2019-12-09 2 1
4 4 2020-12-09 None None 3 2
我想创建2个新字段,分别估算m2和m3。
每当我没有m2和m3时,都会计算m2_estimated和m3_estimated
预期输出为:
ID m1 m2 m3 m1_m2 m2_m3 m2_estimated m3_estimated
0 2019-12-06 2019-12-07 None 1 3 None 2019-12-10
1 2019-12-07 None None 1 3 2019-12-08 2019-12-11
2 2019-12-07 None None 2 4 2019-12-09 2019-12-13
3 2019-12-06 2019-12-07 2019-12-09 2 1 None None
4 2020-12-09 None None 3 2 2019-12-12 2019-12-14
这里的逻辑很简单,我想将m2 + m2_m3加起来得到m3_estimated
答案 0 :(得分:1)
df['m2_estimated'] = pd.to_datetime(df['m1']) + pd.to_timedelta(df['m1_m2'], unit='D')
如果您不想使用 dt 访问器设置日期时间,可以将其设置为日期:
df['m2_estimated'] = df['m2_estimated'].dt.date
答案 1 :(得分:1)
df['m2_estimated'] = pd.to_datetime(df['m1']) + df['m1_m2']
上面的代码就足够了。您必须确保m1_m2为整数格式。