您好,我正在尝试将数据帧日期更改为可用于提取有用信息的格式。 该数据集带有 'week' 特征,格式为 DD/MM/YY,如下所示:
In [128]: df_train[['week', 'units_sold']]
Out[128]:
week units_sold
0 17/01/11 20
1 17/01/11 28
2 17/01/11 19
3 17/01/11 44
4 17/01/11 52
我已将日期更改如下:
df_train['new_date'] = pd.to_datetime(df_train['week'])
new_date units_sold
0 2011-01-17 20.0
1 2011-01-17 28.0
2 2011-01-17 19.0
3 2011-01-17 44.0
4 2011-01-17 52.0
使用我创建的“new_date”功能,我做了以下一些信息提取:
df_train['weekday'] = df_train['new_date'].dt.weekofyear #week day of the year
df_train['QTR'] = df_train['new_date'].apply(lambda x: x.quarter) #current quarter of the year
df_train['month'] = df_train['new_date'].apply(lambda x: x.month) #current month
df_train['year'] = df_train['new_date'].dt.year #current year
但是,在查看我的数据时,我遇到了一些错误。例如,我数据集中的某个日期是 07/02/11,它应该转换为 2 的月份。除了我的解析显示月份是 7,我知道这是不正确的:请参阅条目 3483< /p>
Out[127]:
week month
18 17/01/11 1
1173 24/01/11 1
2328 31/01/11 1
3483 07/02/11 7
4638 14/02/11 2
谁能告诉我我哪里出错了? 任何帮助表示赞赏!
答案 0 :(得分:2)
使用 dayfirst=True
参数:
df_train['new_date'] = pd.to_datetime(df_train['week'], dayfirst=True)
然后是用于提高性能的 .dt
访问器,因为在应用中是引擎盖下的循环:
df_train['weekday'] = df_train['new_date'].dt.weekofyear #week day of the year
df_train['QTR'] = df_train['new_date'].dt.quarter #current quarter of the year
df_train['month'] = df_train['new_date'].dt.month #current month
df_train['year'] = df_train['new_date'].dt.year