Python Pandas Dataframe datatime列分离函数

时间:2018-08-11 11:46:58

标签: python pandas datetime dataframe

是否有任何现有的库能够将datetime列分隔为仅包含一个变量的列,例如年,月,日,小时,分钟等。

我这样做是为了对我打算尝试使用机器学习的数据进行预处理(Kaggle纽约出租车费)。

这是数据集中的datetime列的样子: Datetime Data

我已经可以使用以下方法做到这一点:

df_raw["pickup_year"] = df_raw['pickup_datetime'].dt.year
df_raw["pickup_month"] = df_raw['pickup_datetime'].dt.month
df_raw["pickup_day"] = df_raw['pickup_datetime'].dt.day
df_raw["pickup_hour"] = df_raw['pickup_datetime'].dt.hour
df_raw["pickup_minute"] = df_raw['pickup_datetime'].dt.minute
df_raw["pickup_second"] = df_raw['pickup_datetime'].dt.second
df_raw["pickup_dayofyear"] = df_raw['pickup_datetime'].dt.dayofyear
df_raw["pickup_week"] = df_raw['pickup_datetime'].dt.week
df_raw["pickup_weekofyear"] = df_raw['pickup_datetime'].dt.weekofyear
df_raw["pickup_dayofweek"] = df_raw['pickup_datetime'].dt.dayofweek
df_raw["pickup_weekday"] = df_raw['pickup_datetime'].dt.weekday
df_raw["pickup_quarter"] = df_raw['pickup_datetime'].dt.quarter
df_raw.head()

但是我想肯定已经在以前的某个地方的图书馆里做到了吗?

2 个答案:

答案 0 :(得分:1)

您可以按属性列表循环并按getattr创建新列:

L = ['year', 'month', 'day', 'hour', 'minute', 'second', 'dayofyear',
     'week', 'weekofyear', 'dayofweek', 'weekday', 'quarter']

for i in L:
    df[i] = getattr(df['Dates'].dt, i)
#jpp data sample
print (df)
                Dates  year  month  day  hour  minute  second  dayofyear  \
0 2017-12-11 01:00:00  2017     12   11     1       0       0        345   
1 2017-12-12 01:00:01  2017     12   12     1       0       1        346   
2 2019-05-12 15:15:00  2019      5   12    15      15       0        132   
3 2019-06-22 03:25:14  2019      6   22     3      25      14        173   
4 2020-05-11 04:40:02  2020      5   11     4      40       2        132   
5 2020-11-30 01:00:00  2020     11   30     1       0       0        335   

   week  weekofyear  dayofweek  weekday  quarter  
0    50          50          0        0        4  
1    50          50          1        1        4  
2    19          19          6        6        2  
3    25          25          5        5        2  
4    20          20          0        0        2  
5    49          49          0        0        4  

答案 1 :(得分:0)

您列出的属性是从datetime系列下面的整数数组中派生的。因此,虽然可能有特定于Pandas的方法来提取多个属性,但这可能并不比使用列表或字典映射更有效。以下是使用pd.concat的解决方案。

设置

df = pd.DataFrame({'Dates': ['2017-12-11 01:00:00', '2017-12-12 01:00:01',
                             '2019-05-12 15:15:00', '2019-06-22 03:25:14',
                             '2020-05-11 04:40:02', '2020-11-30 01:00:00']})

df['Dates'] = pd.to_datetime(df['Dates'])

解决方案

L = ['year', 'month', 'day', 'hour', 'minute', 'second', 'dayofyear',
     'week', 'weekofyear', 'dayofweek', 'weekday', 'quarter']

df = df.join(pd.concat([getattr(df['Dates'].dt, i).rename(i) for i in L], axis=1))

结果

print(df)

                Dates  year  month  day  hour  minute  second  dayofyear  \
0 2017-12-11 01:00:00  2017     12   11     1       0       0        345   
1 2017-12-12 01:00:01  2017     12   12     1       0       1        346   
2 2019-05-12 15:15:00  2019      5   12    15      15       0        132   
3 2019-06-22 03:25:14  2019      6   22     3      25      14        173   
4 2020-05-11 04:40:02  2020      5   11     4      40       2        132   
5 2020-11-30 01:00:00  2020     11   30     1       0       0        335   

   week  weekofyear  dayofweek  weekday  quarter  
0    50          50          0        0        4  
1    50          50          1        1        4  
2    19          19          6        6        2  
3    25          25          5        5        2  
4    20          20          0        0        2  
5    49          49          0        0        4