我想用dtype为datetime的另一列创建一列。详细信息如下:
df['finished']
0 2019-01-28 15:53:48
1 2019-01-28 17:11:15
2 2019-01-28 17:12:14
3 2019-01-28 17:12:15
4 2019-01-28 17:12:41
Name: finish, dtype: datetime64[ns]
df['finish'].map(lambda x: 30 if x<='2019-02-01 21:00:00' else 5)
TypeError: Cannot compare type 'Timestamp' with type 'str
答案 0 :(得分:1)
如果以熊猫矢量化方式进行比较-带有值的所有列都不必转换为日期时间,因为熊猫会处理以下比较:
df['new'] = np.where(df['finish'] <='2019-02-01 21:00:00', 30, 5)
print (df)
finish new
0 2019-01-28 15:53:48 30
1 2019-01-28 17:11:15 30
2 2019-01-28 17:12:14 30
3 2019-01-28 17:12:15 30
4 2019-01-28 17:12:41 30
您的解决方案失败了,因为比较标量,所以必须按日期时间循环比较-为每个值调用lambda函数。
也不推荐,因为速度慢。但是解决方案是将字符串转换为Timestamp
或datetime
:
df['new'] = df['finish'].map(lambda x: 30 if x<=pd.Timestamp('2019-02-01 21:00:00') else 5)
性能:
#[5000 rows x 1 columns]
df = pd.concat([df] * 1000, ignore_index=True)
In [165]: %timeit df['new1'] = np.where(df['finish'] <='2019-02-01 21:00:00', 30, 5)
465 µs ± 64.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [166]: %timeit df['new2'] = df['finish'].map(lambda x: 30 if x<=pd.Timestamp('2019-02-01 21:00:00') else 5)
22.4 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)