我目前正在进行回归不连续设计调查。
我在调查开始时有年,月,日,小时和分钟的单独变量,我有调查结束时的年,月,日,小时和分钟。
截至paste()
我已将其合并为starttime
和endtime
变量,两者均为字符。
然后,我使用as.POSIXct()
让R知道变量中的字符为datetimes
,并使用正确的格式yyyy-mm-dd hh:mm
。
由于我需要将日期作为数值,因为时间是我设计中的自变量,我应用以下代码:
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
问题是,该代码仅适用于ESSFR$starttime
,而不适用于ESSFR$endtime
。当应用于ESSFR$endtime
时,我收到了
字符串不是标准的明确格式。
有谁知道为什么代码偶尔会对我有用?
以下是数据摘录:
> dput(head(ESSFR[,582:591]))
structure(list(inwdds = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwmms = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwyys = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"),
inwshh = structure(c(11, 11, 16, 18, 11, 17), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwsmm = structure(c(5, 49, 21, 36, 54, 21), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwdde = structure(c(3, 22, 17, 21, 6, 4), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwmme = structure(c(12, 11, 11, 11, 12, 12), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwyye = structure(c(2014, 2014, 2014, 2014, 2014, 2014), labels = structure(9999, .Names = "Not available"), class = "labelled"),
inwehh = structure(c(12, 12, 18, 20, 13, 18), labels = structure(99, .Names = "Not available"), class = "labelled"),
inwemm = structure(c(13, 59, 5, 0, 7, 45), labels = structure(99, .Names = "Not available"), class = "labelled")), .Names = c("inwdds",
"inwmms", "inwyys", "inwshh", "inwsmm", "inwdde", "inwmme", "inwyye",
"inwehh", "inwemm"), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
以下是代码:
#Creating Dataframe only consisting of French answers
ESSFR <- ESSData %>%
filter(cntry == "FR")
#Collapsing the seperate time variables to one.
#The time variables are:
#Start year = inwyys
#Start month = inwmms
#Start day = inwdds
#Start hour = inwshh
#Start minute = inwsmm
#End year = inwyye
#End month = inwmme
#End day = inwdde
#End hour = inwehh
#End minute = inwemm
#Collapsing starttime variable
ESSFR$startdate <- paste(ESSFR$inwyys,"-",ESSFR$inwmms,"-",ESSFR$inwdds, sep = "")
ESSFR$startdate
ESSFR$startdaytime <- paste(ESSFR$inwshh,":",ESSFR$inwsmm, sep = "")
ESSFR$startdaytime
ESSFR$starttime <- paste(ESSFR$startdate,ESSFR$startdaytime)
ESSFR$starttime
class(ESSFR$starttime) #string variable generated
#Collapsing endtime variable
ESSFR$enddate <- paste(ESSFR$inwyye,"-",ESSFR$inwmme,"-",ESSFR$inwdde, sep = "")
ESSFR$enddate
ESSFR$enddaytime <- paste(ESSFR$inwehh,":",ESSFR$inwemm, sep = "")
ESSFR$enddaytime
ESSFR$endtime <- paste(ESSFR$enddate,ESSFR$enddaytime)
ESSFR$endtime
class(ESSFR$endtime) #string variable generated
#Looking at the two variables
glimpse(ESSFR$starttime)
glimpse(ESSFR$endtime)
#Looking good
#Transforming the two time varibles from string to numerical variables.
ESSFR$starttime_secs <- as.numeric(as.POSIXct(ESSFR$starttime))
ESSFR$starttime_secs
ESSFR$endtime_secs <- as.numeric(as.POSIXct(ESSFR$endtime))
ESSFR$endtime_secs
这是数据和当前脚本的链接 https://wetransfer.com/downloads/cb528871a341c1b2118d5db9e03d16ee20180608103455/11ca2d
提前谢谢。
答案 0 :(得分:1)
可能有些结束时间是NA或空白。如果你打印它们看起来还不错,那么它们可能大部分都没关系,但是那里潜藏着一些不好的东西。
您可以使用此代码一次处理一个条目,为不良条目提供NA
。不要在生产中使用它,它很慢:
sapply(ESSFR$endtime_secs,
function(x)
tryCatch(as.POSIXct(x), error = function(x) NA))
例如,
ESSFR <- list(endtime_secs = c("2018-06-07 11:00 AM", "bad"))
sapply(ESSFR$endtime_secs,
function(x)
tryCatch(as.POSIXct(x), error = function(x) NA))
#> 2018-06-07 11:00 AM bad
#> 1528383600 NA
您还可以使用strptime()
并获取错误条目NA
,但是您需要明确指定格式。