Question

我想用dtype为datetime的另一列创建一列。详细信息如下：

 df['finished']

0   2019-01-28 15:53:48
1   2019-01-28 17:11:15
2   2019-01-28 17:12:14
3   2019-01-28 17:12:15
4   2019-01-28 17:12:41
Name: finish, dtype: datetime64[ns]

df['finish'].map(lambda x: 30 if x<='2019-02-01 21:00:00' else 5)

TypeError: Cannot compare type 'Timestamp' with type 'str

Answer 1

如果以熊猫矢量化方式进行比较-带有值的所有列都不必转换为日期时间，因为熊猫会处理以下比较：

df['new'] = np.where(df['finish'] <='2019-02-01 21:00:00', 30, 5)
print (df)
               finish  new
0 2019-01-28 15:53:48   30
1 2019-01-28 17:11:15   30
2 2019-01-28 17:12:14   30
3 2019-01-28 17:12:15   30
4 2019-01-28 17:12:41   30

您的解决方案失败了，因为比较标量，所以必须按日期时间循环比较-为每个值调用lambda函数。

也不推荐，因为速度慢。但是解决方案是将字符串转换为Timestamp或datetime：

df['new'] = df['finish'].map(lambda x: 30 if x<=pd.Timestamp('2019-02-01 21:00:00') else 5)

性能：

#[5000 rows x 1 columns]
df = pd.concat([df] * 1000, ignore_index=True)

In [165]: %timeit df['new1'] = np.where(df['finish'] <='2019-02-01 21:00:00', 30, 5)
465 µs ± 64.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [166]: %timeit df['new2'] = df['finish'].map(lambda x: 30 if x<=pd.Timestamp('2019-02-01 21:00:00') else 5)
22.4 ms ± 228 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

如何基于日期时间值创建列？

1 个答案: