我有一个pandas.DataFrame
列,如下所示:
0 2013-07-01 13:20:05.072029
1 2013-07-01 15:49:33.110849
2 2013-07-01 13:39:18.608330
Name: invite_sent_time, dtype: datetime64[ns]
现在我想创建另一个列month
,如果日期范围介于2013-07-01
和2013-08-01
之间,那么Jul
其他Aug
我做了类似下面的事情:
# Creating a column for month.
invites_combined["month"]=np.where(((invites_combined.invite_sent_time.dt.Date >= pd.Timestamp('2013-07-01')) & \
(invites_combined.invite_sent_time.dt.Date < pd.Timestamp('2013-08-01'))),"July","Aug")
但它表示不能将Date与Timestamp进行比较。我不能直接在引号中使用日期,因为它被视为字符串。
那我哪里错了?
答案 0 :(得分:2)
您需要将date()
添加到Timestamp
以进行比较dates
:
dates = invites_combined.invite_sent_time.dt.date
mask = (dates>=pd.Timestamp('2013-07-01').date()) & (dates<pd.Timestamp('2013-08-01').date())
invites_combined["month"] = np.where(mask,"July","Aug")
或between
:
mask = invites_combined.invite_sent_time.between('2013-07-01', '2013-08-01')
invites_combined["month"] = np.where(mask ,"July","Aug")
但更好,更通用的是使用strftime
:
invites_combined["month"] = invites_combined.invite_sent_time.dt.strftime('%b')
样品:
print (invites_combined)
invite_sent_time
0 2013-07-01 13:20:05.072029
1 2013-07-01 15:49:33.110849
2 2013-08-01 13:39:18.608330 <-last date was changed to August
invites_combined["month"] = invites_combined.invite_sent_time.dt.strftime('%b')
print (invites_combined)
invite_sent_time month
0 2013-07-01 13:20:05.072029 Jul
1 2013-07-01 15:49:33.110849 Jul
2 2013-08-01 13:39:18.608330 Aug