将一年中的日期转换为日期时间

时间:2020-10-23 12:09:40

标签: python pandas datetime

我有一个数据文件,其中包含以下年份,年份(DOY),小时和分钟:

> library(dplyr)
> df %>% group_by(csize) %>% mutate(across(v1:v3, ~ replace_na(., mean(., na.rm = T))))
# A tibble: 10 x 5
# Groups:   csize [2]
   name  csize     v1      v2     v3
   <chr> <chr>  <dbl>   <dbl>  <dbl>
 1 a     L      1.57   0.310  -1.76 
 2 b     S     -0.705  0.0655  0.577
 3 c     S     -1.05   1.28    1.82 
 4 d     L      0.958 -2.09   -0.371
 5 e     L     -0.712  0.247  -1.13 
 6 f     S     -1.05  -0.516  -0.107
 7 g     L      0.403  1.79    0.128
 8 h     S     -0.793  1.52    1.07 
 9 i     L     -0.206 -0.369  -1.77 
10 j     S     -1.65  -0.992  -0.476

为了设置日期时间,我使用了:

                         BuoyID  Year  Hour  Min       DOY   POS_DOY     Lat     Lon     Ts

            0      300234065718160  2019     7    0  216.2920  216.2920  58.559 -23.914  14.61

            1      300234065718160  2019     9    0  216.3750  216.3750  58.563 -23.905  14.60

            2      300234065718160  2019    10    0  216.4170  216.4170  58.564 -23.903  14.60

            3      300234065718160  2019    11    0  216.4580  216.4580  58.563 -23.906  14.60

            4      300234065718160  2019    12    0  216.5000  216.5000  58.561 -23.910  14.60

当时间不是 int 而是 float 时,就会出现我的问题。例如:

dt_raw = pd.to_datetime(df_buoy['Year'] * 1000 + df_buoy['DOY'], format='%Y%j')

# Convert to datetime
dt_buoy = [d.date() for d in dt_raw]
date = datetime.datetime.combine(dt_buoy[0], datetime.time(df_buoy.Hour[0], df_buoy.Min[0]))

我想做的是在 str 中转换小时,获取前两个索引,从而获得小时,然后从“小时”中减去小时并乘以60以得到分钟

                   BuoyID  Year   Hour  Min      DOY  POS_DOY       Lat       Lon      BP    Ts
          0    300234061876910  2014  23.33    0  226.972  226.972  71.93081 -141.0792  1016.9 -0.01
          1    300234061876910  2014  23.50    0  226.979  226.979  71.93020 -141.0826  1016.8  3.36
          2    300234061876910  2014  23.67    0  226.986  226.986  71.92968 -141.0856  1016.8  3.28
          3    300234061876910  2014  23.83    0  226.993  226.993  71.92934 -141.0876  1016.8  3.22
          4    300234061876910  2014   0.00    0  227.000  227.000  71.92904 -141.0894  1016.8  3.18   

但是,当然,如果您将'0。'作为小时,Python会抱怨:

int_hour = [(int(str(i)[0:2])) for i in df_buoy.Hour]  
minutes = map(lambda x, y: (x - y)*60, df_buoy.Hour, int_hour)

我的问题是:有人知道有一种简单的方法可以将年,DOY,小时( int 或* float)和分钟转换为日期时间吗?

1 个答案:

答案 0 :(得分:1)

使用to_timedelta来转换小时数列并添加到日期时间,可以很好地使用整数和浮点数:

df['d'] = (pd.to_datetime(df['Year'] * 1000 + df['DOY'], format='%Y%j') +
           pd.to_timedelta(df['Hour'], unit='h'))

print (df)
            BuoyID  Year  Hour  Min      DOY  POS_DOY     Lat     Lon     Ts  \
0  300234065718160  2019     7    0  216.292  216.292  58.559 -23.914  14.61   
1  300234065718160  2019     9    0  216.375  216.375  58.563 -23.905  14.60   
2  300234065718160  2019    10    0  216.417  216.417  58.564 -23.903  14.60   
3  300234065718160  2019    11    0  216.458  216.458  58.563 -23.906  14.60   
4  300234065718160  2019    12    0  216.500  216.500  58.561 -23.910  14.60   

                    d  
0 2019-08-04 07:00:00  
1 2019-08-04 09:00:00  
2 2019-08-04 10:00:00  
3 2019-08-04 11:00:00  
4 2019-08-04 12:00:00  

df['d'] = (pd.to_datetime(df['Year'] * 1000 + df['DOY'], format='%Y%j') +
           pd.to_timedelta(df['Hour'], unit='h'))

print (df)
            BuoyID  Year   Hour  Min      DOY  POS_DOY       Lat       Lon  \
0  300234061876910  2014  23.33    0  226.972  226.972  71.93081 -141.0792   
1  300234061876910  2014  23.50    0  226.979  226.979  71.93020 -141.0826   
2  300234061876910  2014  23.67    0  226.986  226.986  71.92968 -141.0856   
3  300234061876910  2014  23.83    0  226.993  226.993  71.92934 -141.0876   
4  300234061876910  2014   0.00    0  227.000  227.000  71.92904 -141.0894   

       BP    Ts                   d  
0  1016.9 -0.01 2014-08-14 23:19:48  
1  1016.8  3.36 2014-08-14 23:30:00  
2  1016.8  3.28 2014-08-14 23:40:12  
3  1016.8  3.22 2014-08-14 23:49:48  
4  1016.0   NaN 2014-08-15 00:00:00