我已经从.csv文件中加载了一个熊猫数据帧,该文件包含具有日期时间值的列。
df = pd.read_csv('data.csv')
具有日期时间值的列的名称为pickup_datetime
。如果执行df['pickup_datetime'].head()
,这就是我得到的:
0 2009-06-15 17:26:00+00:00
1 2010-01-05 16:52:00+00:00
2 2011-08-18 00:35:00+00:00
3 2012-04-21 04:30:00+00:00
4 2010-03-09 07:51:00+00:00
Name: pickup_datetime, dtype: datetime64[ns, UTC]
如何将此列转换为仅具有datetime的日值的numpy数组?例如:15
中的0 2009-06-15 17:26:00+00:00
,05
中的1 2010-01-05 16:52:00+00:00
,等等。
答案 0 :(得分:5)
df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'], errors='coerce')
df['pickup_datetime'].dt.day.values
# array([15, 5, 18, 21, 9])
答案 1 :(得分:1)
只需添加另一个变体,尽管coldspeed已经提供了简短的答案,作为圣诞节和新年奖励:-):
>>> df
pickup_datetime
0 2009-06-15 17:26:00+00:00
1 2010-01-05 16:52:00+00:00
2 2011-08-18 00:35:00+00:00
3 2012-04-21 04:30:00+00:00
4 2010-03-09 07:51:00+00:00
通过推断其格式将字符串转换为时间戳:
>>> df['pickup_datetime'] = pd.to_datetime(df['pickup_datetime'])
>>> df
pickup_datetime
0 2009-06-15 17:26:00
1 2010-01-05 16:52:00
2 2011-08-18 00:35:00
3 2012-04-21 04:30:00
4 2010-03-09 07:51:00
您可以通过pickup_datetime
记录当天的情况:
>>> df['pickup_datetime'].dt.day
0 15
1 5
2 18
3 21
4 9
Name: pickup_datetime, dtype: int64
您可以通过pickup_datetime
来记录该月份的唯一记录:
>>> df['pickup_datetime'].dt.month
0 6
1 1
2 8
3 4
4 3
您只能通过pickup_datetime
>>> df['pickup_datetime'].dt.year
0 2009
1 2010
2 2011
3 2012
4 2010