我不确定自己是否正确执行了操作,但是最终我试图建立一个数据集来进行时间序列预测。我有一个“日期”列,但我只想捕获年,月,日和小时(忽略分钟/秒)
我将如何去做?我首先想到要解析出我想要的功能,然后通过创建新日期来追加新的“格式化日期”列。但这似乎很麻烦(或者我做错了)
原始数据:
ID Date Primary Type Year
111434 23810 02/04/2018 01:36:00 AM HOMICIDE 2018
230458 23811 02/05/2018 01:10:00 AM HOMICIDE 2018
300168 11223630 02/03/2018 02:40:00 PM CRIMINAL DAMAGE 2018
385295 23812 02/06/2018 04:10:00 AM HOMICIDE 2018
484892 23813 02/07/2018 09:23:00 AM HOMICIDE 2018
crime_df['Hour'] = pd.to_datetime(crime_df['Date']).dt.hour
crime_df['Day'] = pd.to_datetime(crime_df['Date']).dt.day
crime_df['Month'] = pd.to_datetime(crime_df['Date']).dt.month
print(crime_df.head())
ID Date Primary Type Year Hour Day \
111434 23810 02/04/2018 01:36:00 AM HOMICIDE 2018 1 4
230458 23811 02/05/2018 01:10:00 AM HOMICIDE 2018 1 5
300168 11223630 02/03/2018 02:40:00 PM CRIMINAL DAMAGE 2018 14 3
385295 23812 02/06/2018 04:10:00 AM HOMICIDE 2018 4 6
484892 23813 02/07/2018 09:23:00 AM HOMICIDE 2018 9 7
Month
111434 2
230458 2
300168 2
385295 2
484892 2
以零分钟/秒为单位创建新列的最佳方法是什么?我尝试过几种尝试,但确实没有。
crime_df['FormattedDate'] = pd.Timestamp((crime_df['Year'], crime_df['Month'], crime_df['Day'], crime_df['Hour']))
然后我意识到可能有一种方法可以直接从“日期”列进行,但是我在朋友Google上找不到很多帮助