格式化R中的日期(非标准格式)

时间:2015-10-12 23:55:22

标签: r timezone strptime

对R来说不是新手,或者在R中格式化日期并且不会问这个问题,但是我的行为非常奇怪,而且在过去的2小时内我没有接近解决它。

我有一个已导入的数据集,并希望使用as.POSIXct格式化日期/时间列。日期是非标准格式,我已经应用了我所知道的正确格式。这是我遇到问题的一小部分数据。代码就在之后。问题是从"2015-03-08 02:00:00 PST"开始有4个NA。是什么赋予了?这似乎是完全随机的,因为它在任何其他55K观测中都没有发生。

bad.Dates<-c("3/7/2015 14:15", "3/7/2015 14:30", "3/7/2015 14:45", "3/7/2015 15:00", 
         "3/7/2015 15:15", "3/7/2015 15:30", "3/7/2015 15:45", "3/7/2015 16:00", 
         "3/7/2015 16:15", "3/7/2015 16:30", "3/7/2015 16:45", "3/7/2015 17:00", 
         "3/7/2015 17:15", "3/7/2015 17:30", "3/7/2015 17:45", "3/7/2015 18:00", 
         "3/7/2015 18:15", "3/7/2015 18:30", "3/7/2015 18:45", "3/7/2015 19:00", 
         "3/7/2015 19:15", "3/7/2015 19:30", "3/7/2015 19:45", "3/7/2015 20:00", 
         "3/7/2015 20:15", "3/7/2015 20:30", "3/7/2015 20:45", "3/7/2015 21:00", 
         "3/7/2015 21:15", "3/7/2015 21:30", "3/7/2015 21:45", "3/7/2015 22:00", 
         "3/7/2015 22:15", "3/7/2015 22:30", "3/7/2015 22:45", "3/7/2015 23:00", 
         "3/7/2015 23:15", "3/7/2015 23:30", "3/7/2015 23:45", "3/8/2015 0:00", 
         "3/8/2015 0:15", "3/8/2015 0:30", "3/8/2015 0:45", "3/8/2015 1:00", 
         "3/8/2015 1:15", "3/8/2015 1:30", "3/8/2015 1:45", "3/8/2015 2:00", 
         "3/8/2015 2:15", "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00", 
         "3/8/2015 3:15", "3/8/2015 3:30", "3/8/2015 3:45", "3/8/2015 4:00", 
         "3/8/2015 4:15", "3/8/2015 4:30", "3/8/2015 4:45", "3/8/2015 5:00", 
         "3/8/2015 5:15", "3/8/2015 5:30", "3/8/2015 5:45", "3/8/2015 6:00", 
         "3/8/2015 6:15", "3/8/2015 6:30", "3/8/2015 6:45", "3/8/2015 7:00", 
         "3/8/2015 7:15", "3/8/2015 7:30", "3/8/2015 7:45", "3/8/2015 8:00", 
         "3/8/2015 8:15", "3/8/2015 8:30", "3/8/2015 8:45", "3/8/2015 9:00", 
         "3/8/2015 9:15", "3/8/2015 9:30", "3/8/2015 9:45", "3/8/2015 10:00", 
         "3/8/2015 10:15", "3/8/2015 10:30", "3/8/2015 10:45", "3/8/2015 11:00", 
         "3/8/2015 11:15", "3/8/2015 11:30", "3/8/2015 11:45", "3/8/2015 12:00", 
         "3/8/2015 12:15", "3/8/2015 12:30", "3/8/2015 12:45", "3/8/2015 13:00", 
         "3/8/2015 13:15", "3/8/2015 13:30", "3/8/2015 13:45", "3/8/2015 14:00", 
         "3/8/2015 14:15", "3/8/2015 14:30", "3/8/2015 14:45", "3/8/2015 15:00", 
         "3/8/2015 15:15") 

as.POSIXct(strptime(bad.Dates,"%m/%d/%Y %H:%M"))

1 个答案:

答案 0 :(得分:3)

要使此示例可重现/可解决而不考虑位置,请通过tz=明确指定时区:

bad.Dates <- c("3/8/2015 1:45", "3/8/2015 2:00", "3/8/2015 2:15",
               "3/8/2015 2:30", "3/8/2015 2:45", "3/8/2015 3:00")
as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="US/Pacific")

#[1] "2015-03-08 01:45:00 PST"
#[2] NA                       
#[3] NA                       
#[4] NA                       
#[5] NA                       
#[6] "2015-03-08 03:00:00 PDT"

你得到NA因为在美国太平洋地区的现代计时中不存在这些时间。

  

大多数美国,加拿大和墨西哥的北部边境城市   将于2015年3月8日星期日开始夏令时(DST)。人们   在观察DST的区域将从凌晨2点开始向前一小时   (02:00)至当地时间凌晨3点(03:00)   来源: http://www.timeanddate.com/news/time/usa-canada-start-dst-2015.html

指定像"UTC"这样不会观察夏令时的时区会解决这个问题。

as.POSIXct(bad.Dates, format="%m/%d/%Y %H:%M", tz="UTC")
#[1] "2015-03-08 01:45:00 UTC"
#[2] "2015-03-08 02:00:00 UTC"
#[3] "2015-03-08 02:15:00 UTC"
#[4] "2015-03-08 02:30:00 UTC"
#[5] "2015-03-08 02:45:00 UTC"
#[6] "2015-03-08 03:00:00 UTC"