as.numeric(as.POSIXct(x))只能偶尔使用

时间:2018-06-08 10:18:07

标签: r

我目前正在进行回归不连续设计调查。

我在调查开始时有年,月,日,小时和分钟的单独变量,我有调查结束时的年,月,日,小时和分钟。

截至paste()我已将其合并为starttimeendtime变量,两者均为字符。 然后,我使用as.POSIXct()让R知道变量中的字符为datetimes,并使用正确的格式yyyy-mm-dd hh:mm

由于我需要将日期作为数值,因为时间是我设计中的自变量,我应用以下代码:

ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))

ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))

问题是,该代码仅适用于ESSFR$starttime,而不适用于ESSFR$endtime。当应用于ESSFR$endtime时,我收到了

的消息
  

字符串不是标准的明确格式。

有谁知道为什么代码偶尔会对我有用?

以下是数据摘录:

    > dput(head(ESSFR[,582:591]))
structure(list(inwdds = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwmms = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwyys = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"), 
    inwshh = structure(c(11, 11, 16, 18, 11, 17), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwsmm = structure(c(5, 49, 21, 36, 54, 21), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwdde = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwmme = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwyye = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"), 
    inwehh = structure(c(12, 12, 18, 20, 13, 18), labels = structure(99, .Names = "Not available"), class = "labelled"), 
    inwemm = structure(c(13, 59, 5, 0, 7, 45), labels = structure(99, .Names = "Not available"), class = "labelled")), .Names = c("inwdds", 
"inwmms", "inwyys", "inwshh", "inwsmm", "inwdde", "inwmme", "inwyye", 
"inwehh", "inwemm"), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

以下是代码:

#Creating Dataframe only consisting of French answers
ESSFR <- ESSData %>%
  filter(cntry == "FR")

#Collapsing the seperate time variables to one.
#The time variables are: 
  #Start year = inwyys
  #Start month = inwmms
  #Start day = inwdds
  #Start hour = inwshh
  #Start minute = inwsmm

  #End year = inwyye
  #End month = inwmme
  #End day = inwdde
  #End hour = inwehh
  #End minute = inwemm

#Collapsing starttime variable
ESSFR$startdate <- paste(ESSFR$inwyys,"-",ESSFR$inwmms,"-",ESSFR$inwdds, sep = "")
ESSFR$startdate

ESSFR$startdaytime <- paste(ESSFR$inwshh,":",ESSFR$inwsmm, sep = "")
ESSFR$startdaytime

ESSFR$starttime <- paste(ESSFR$startdate,ESSFR$startdaytime)
ESSFR$starttime
class(ESSFR$starttime) #string variable generated

#Collapsing endtime variable
ESSFR$enddate <- paste(ESSFR$inwyye,"-",ESSFR$inwmme,"-",ESSFR$inwdde, sep = "")
ESSFR$enddate

ESSFR$enddaytime <- paste(ESSFR$inwehh,":",ESSFR$inwemm, sep = "")
ESSFR$enddaytime

ESSFR$endtime <- paste(ESSFR$enddate,ESSFR$enddaytime)
ESSFR$endtime
class(ESSFR$endtime) #string variable generated

#Looking at the two variables
glimpse(ESSFR$starttime)
glimpse(ESSFR$endtime)
#Looking good

#Transforming the two time varibles from string to numerical variables.
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$starttime_secs

ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
ESSFR$endtime_secs

这是数据和当前脚本的链接 https://wetransfer.com/downloads/cb528871a341c1b2118d5db9e03d16ee20180608103455/11ca2d

提前谢谢。

1 个答案:

答案 0 :(得分:1)

可能有些结束时间是NA或空白。如果你打印它们看起来还不错,那么它们可能大部分都没关系,但是那里潜藏着一些不好的东西。

您可以使用此代码一次处理一个条目,为不良条目提供NA。不要在生产中使用它,它很慢:

sapply(ESSFR$endtime_secs, 
       function(x) 
         tryCatch(as.POSIXct(x), error = function(x) NA))

例如,

ESSFR <- list(endtime_secs = c("2018-06-07 11:00 AM", "bad"))

sapply(ESSFR$endtime_secs, 
         function(x) 
           tryCatch(as.POSIXct(x), error = function(x) NA))
#> 2018-06-07 11:00 AM                 bad 
#>          1528383600                  NA

您还可以使用strptime()并获取错误条目NA,但是您需要明确指定格式。