如何将日期时间列从“非UTC”格式转换为“ UTC”格式,而又不丢失R随时间变化的日期的数据

时间:2019-04-06 23:21:33

标签: r

我有一个数据帧df1,其中有一个datetime列,格式为UTC。我需要通过列df2将此数据帧与数据帧datetime合并。我的问题是df2的格式为Europe/Paris,当我将df2$datetimeEurope/Paris转换为UTC格式时,我暂时丢失或复制了数据这是夏/冬或冬/夏之间的时间变化。例如:

df1<- data.frame(datetime=c("2016-10-29 22:00:00","2016-10-29 23:00:00","2016-10-30 00:00:00","2016-10-30 01:00:00","2016-10-30 02:00:00","2016-10-30 03:00:00","2016-10-30 04:00:00","2016-10-30 05:00:00","2016-03-25 22:00:00","2016-03-25 23:00:00","2016-03-26 00:00:00","2016-03-26 01:00:00","2016-03-26 02:00:00","2016-03-26 03:00:00","2016-03-26 04:00:00"), Var1= c(4, 56, 76, 54, 34, 3, 4, 6, 78, 23, 12, 3, 5, 6, 7))
df1$datetime<- as.POSIXct(df1$datetime, format = "%Y-%m-%d %H", tz= "UTC")
df2<- data.frame(datetime=c("2016-10-29 22:00:00","2016-10-29 23:00:00","2016-10-30 00:00:00","2016-10-30 01:00:00","2016-10-30 02:00:00","2016-10-30 03:00:00","2016-10-30 04:00:00","2016-10-30 05:00:00","2016-03-25 22:00:00","2016-03-25 23:00:00","2016-03-26 00:00:00","2016-03-26 01:00:00","2016-03-26 02:00:00","2016-03-26 03:00:00","2016-03-26 04:00:00"), Var2=c(56, 43, 23, 14, 51, 27, 89, 76, 56, 4, 35, 23, 4, 62, 84))
df2$datetime<- as.POSIXct(df2$datetime, format = "%Y-%m-%d %H", tz= "Europe/Paris")

df1
              datetime Var1
1  2016-10-29 22:00:00    4
2  2016-10-29 23:00:00   56
3  2016-10-30 00:00:00   76
4  2016-10-30 01:00:00   54
5  2016-10-30 02:00:00   34
6  2016-10-30 03:00:00    3
7  2016-10-30 04:00:00    4
8  2016-10-30 05:00:00    6
9  2017-03-25 22:00:00   78
10 2017-03-25 23:00:00   23
11 2017-03-26 00:00:00   12
12 2017-03-26 01:00:00    3
13 2017-03-26 02:00:00    5
14 2017-03-26 03:00:00    6
15 2017-03-26 04:00:00    7

df2
              datetime Var2
1  2016-10-29 22:00:00   56
2  2016-10-29 23:00:00   43
3  2016-10-30 00:00:00   23
4  2016-10-30 01:00:00   14
5  2016-10-30 02:00:00   51
6  2016-10-30 03:00:00   27
7  2016-10-30 04:00:00   89
8  2016-10-30 05:00:00   76
9  2017-03-25 22:00:00   56
10 2017-03-25 23:00:00    4
11 2017-03-26 00:00:00   35
12 2017-03-26 01:00:00   23
13 2017-03-26 02:00:00    4
14 2017-03-26 03:00:00   62
15 2017-03-26 04:00:00   84

当我将df2 $ datetime格式从Europe/Paris更改为UTC时,会发生这种情况:

library(lubridate)
df2$datetime<-with_tz(df2$datetime,"UTC")

df2
              datetime Var2
1  2016-10-29 20:00:00   56
2  2016-10-29 21:00:00   43
3  2016-10-29 22:00:00   23
4  2016-10-29 23:00:00   14
5  2016-10-30 00:00:00   51
6  2016-10-30 02:00:00   27 # Data at 01:00:00 is missing
7  2016-10-30 03:00:00   89
8  2016-10-30 04:00:00   76
9  2017-03-25 21:00:00   56
10 2017-03-25 22:00:00    4
11 2017-03-25 23:00:00   35
12 2017-03-26 00:00:00   23
13 2017-03-26 00:00:00    4 # There is a duplicate at 00:00:00
14 2017-03-26 01:00:00   62
15 2017-03-26 02:00:00   84
16 2017-03-26 03:00:00   56

是否有另一种方法将df2$datetimeEurope/Paris格式转换为UTC格式,使我可以合并两个数据帧而不会出现丢失或重复数据的问题?我不明白为什么我必须丢失或复制df2中的信息。

我是否在df2$datetime中进行了正确的转换,以便将此数据帧与df1合并?到目前为止,我为解决此问题所做的工作是在2016年10月30日的df2的01:00:00处添加新行,这是2016-10-30 00:00:002016-10-30 02:00:00之间的平均值并在2017-03-26 00:00:00删除一行。

感谢您的帮助。

2 个答案:

答案 0 :(得分:0)

我发现我原来的df2应该是这样的:

df2
              datetime Var1
1  2016-10-29 22:00:00    4 # This is time in format "GMT+2". It corresponds to 20:00 UTC
2  2016-10-29 23:00:00   56 # This is time in format "GMT+2". It corresponds to 21:00 UTC
3  2016-10-30 00:00:00   76 # This is time in format "GMT+2". It corresponds to 22:00 UTC
4  2016-10-30 01:00:00   54 # This is time in format "GMT+2". It corresponds to 23:00 UTC
5  2016-10-30 02:00:00   34 # This is time in format "GMT+2". It corresponds to 00:00 UTC
6  2016-10-30 02:00:00    3 # This is time in format "GMT+1". It corresponds to 01:00 UTC
7  2016-10-30 03:00:00    4 # This is time in format "GMT+1". It corresponds to 02:00 UTC
8  2016-10-30 04:00:00    6 # This is time in format "GMT+1". It corresponds to 03:00 UTC
9  2016-10-30 05:00:00   78 # This is time in format "GMT+1". It corresponds to 04:00 UTC
10 2017-03-25 22:00:00   23 # This is time in format "GMT+1". It corresponds to 21:00 UTC 
11 2017-03-25 23:00:00   12 # This is time in format "GMT+1". It corresponds to 22:00 UTC 
12 2017-03-26 00:00:00    3 # This is time in format "GMT+1". It corresponds to 23:00 UTC 
13 2017-03-26 01:00:00    5 # This is time in format "GMT+1". It corresponds to 00:00 UTC 
14 2017-03-26 03:00:00    6 # This is time in format "GMT+2". It corresponds to 01:00 UTC 
15 2017-03-26 04:00:00    7 # This is time in format "GMT+2". It corresponds to 02:00 UTC 
16 2017-03-26 05:00:00   76 # This is time in format "GMT+2". It corresponds to 03:00 UTC 

但是,我的原始df2没有重复或丢失的时间数据。就像这样:

df2
              datetime Var1
1  2016-10-29 22:00:00    4
2  2016-10-29 23:00:00   56
3  2016-10-30 00:00:00   76
4  2016-10-30 01:00:00   54
5  2016-10-30 02:00:00   34
6  2016-10-30 03:00:00    3
7  2016-10-30 04:00:00    4
8  2016-10-30 05:00:00    6
9  2017-03-25 22:00:00   78
10 2017-03-25 23:00:00   23
11 2017-03-26 00:00:00   12
12 2017-03-26 01:00:00    3
13 2017-10-30 02:00:00    5
14 2017-03-26 03:00:00    6
15 2017-03-26 04:00:00    7
16 2017-03-26 05:00:00   76

当我应用R代码df2$datetime<-with_tz(df2$datetime,"UTC")时,会发生这种情况:

df2
              datetime Var1
1  2016-10-29 20:00:00    4
2  2016-10-29 21:00:00   56
3  2016-10-29 22:00:00   76
4  2016-10-29 23:00:00   54
5  2016-10-30 00:00:00   34
6  2016-10-30 02:00:00    3 # I have to add mannually a new row between the times "00:00" and "02:00"
7  2016-10-30 03:00:00    4
8  2016-10-30 04:00:00    6
9  2017-03-25 21:00:00   78
10 2017-03-25 22:00:00   23
11 2017-03-25 23:00:00   12
12 2017-03-26 00:00:00    3
13 2017-10-30 01:00:00    5 # I have to remove mannually one of the rows refered to the time "01:00".
14 2017-03-26 01:00:00    6
15 2017-03-26 02:00:00    7
16 2017-03-26 03:00:00   76

如果我的原始df2在10月30日的“ 02:00:00”有一个重复项,并且在3月26日的“ 01:00”和“ 03:00”之间有间隔,我会接受R代码df2$datetime<-with_tz(df2$datetime,"UTC")

df2
              datetime Var1
1  2016-10-29 20:00:00    4
2  2016-10-29 21:00:00   56
3  2016-10-29 22:00:00   76
4  2016-10-29 23:00:00   54
5  2016-10-30 00:00:00   34
6  2016-10-30 00:00:00    3 # I just have to change "00:00:00" for "01:00"
7  2016-10-30 02:00:00    4
8  2016-10-30 03:00:00    6
9  2016-10-30 04:00:00   78
10 2017-03-25 21:00:00   23
11 2017-03-25 22:00:00   12
12 2017-03-25 23:00:00    3
13 2017-03-26 00:00:00    5
14 2017-03-26 01:00:00    6
15 2017-03-26 02:00:00    7
16 2017-03-26 03:00:00   76

答案 1 :(得分:0)

#As there are some Versions of df2 I use the one shown in the Question
df2 <- read.table(text = "
              datetime Var2
1  '2016-10-29 22:00:00'   56
2  '2016-10-29 23:00:00'   43
3  '2016-10-30 00:00:00'   23
4  '2016-10-30 01:00:00'   14
5  '2016-10-30 02:00:00'   51
6  '2016-10-30 03:00:00'   27
7  '2016-10-30 04:00:00'   89
8  '2016-10-30 05:00:00'   76
9  '2017-03-25 22:00:00'   56
10 '2017-03-25 23:00:00'    4
11 '2017-03-26 00:00:00'   35
12 '2017-03-26 01:00:00'   23
13 '2017-03-26 02:00:00'    4
14 '2017-03-26 03:00:00'   62
15 '2017-03-26 04:00:00'   84
", header = TRUE)

library(lubridate)

#When you define now the timezone the content of df2 is already changed
df2$datetimeEP <- as.POSIXct(df2$datetime, format = "%Y-%m-%d %H", tz= "Europe/Paris")
#df2[13,]
#              datetime Var2          datetimeEP
#13 2017-03-26 02:00:00    4 2017-03-26 01:00:00

#For me it looks like that your recorded times don't consider "daylight savings time".
#So your have to uses e.g. "Etc/GMT-1" instead of "Europe/Paris"
df2$datetimeG1 <- as.POSIXct(df2$datetime, format = "%Y-%m-%d %H", tz= "Etc/GMT-1")
data.frame(datetime=df2$datetime, utc=with_tz(df2$datetimeG1,"UTC"))
#              datetime                 utc
#1  2016-10-29 22:00:00 2016-10-29 21:00:00
#2  2016-10-29 23:00:00 2016-10-29 22:00:00
#3  2016-10-30 00:00:00 2016-10-29 23:00:00
#4  2016-10-30 01:00:00 2016-10-30 00:00:00
#5  2016-10-30 02:00:00 2016-10-30 01:00:00
#6  2016-10-30 03:00:00 2016-10-30 02:00:00
#7  2016-10-30 04:00:00 2016-10-30 03:00:00
#8  2016-10-30 05:00:00 2016-10-30 04:00:00
#9  2017-03-25 22:00:00 2017-03-25 21:00:00
#10 2017-03-25 23:00:00 2017-03-25 22:00:00
#11 2017-03-26 00:00:00 2017-03-25 23:00:00
#12 2017-03-26 01:00:00 2017-03-26 00:00:00
#13 2017-03-26 02:00:00 2017-03-26 01:00:00
#14 2017-03-26 03:00:00 2017-03-26 02:00:00
#15 2017-03-26 04:00:00 2017-03-26 03:00:00

#You can use "dst" to see if datetime of a time zone has "daylight savings time"
dst(df2$datetimeEP)
dst(df2$datetimeG1)
dst(with_tz(df2$datetimeEP,"UTC"))
dst(with_tz(df2$datetimeG1,"UTC"))

#If your recorded times consider "daylight savings time" then you HAVE a gap and an overlap.
相关问题