我创建了一个包含数据的数据框:
idCol <- c('1','1','2','2')
stepCol <- c('step1' , 'step2' , 'step1' , 'step2')
timestampCol <- c('01-01-2017:09.00', '01-01-2017:10.00', '01-01-2017:09:00', '01-01-2017:14.00')
mydata <- data.frame(idCol , stepCol , timestampCol)
colnames(mydata) <- c('id' , 'steps' , 'timestamp')
stepCol是给定id的开始时间,当step2开始时,这意味着step1已经结束。 我尝试根据步骤开始时间生成包含每个ID持续时间平均值的tibble。
所以我试图生成:
step , averagetime
step1 , 1 hour
step2 , 5 hours
最近我得到的是:
diffTime <- c(0, difftime(ymd_hms(mydata$timestamp[-1]), ymd_hms(mydata$timestamp[-nrow(mydata)]), units="hours"))
diffTime %>% group_by(id, steps) %>% summarize(mean(diffTime))
但是返回错误:
Error in UseMethod("group_by_") :
no applicable method for 'group_by_' applied to an object of class "c('double', 'numeric')"
答案 0 :(得分:1)
我对您的代码进行了一些小修改,但基本上您需要将ymd_hms
的结果与您的mydata
相关联:
mydata$diffTime <- c(0, difftime(lubridate::ymd_hms(mydata$timestamp[-1]),
lubridate::ymd_hms(mydata$timestamp[-nrow(mydata)]), units="hours"))
diffTime <- mydata %>% group_by(id) %>% summarize(mean(diffTime))
返回:
R> diffTime
# A tibble: 2 x 2
id `mean(diffTime)`
<chr> <dbl>
1 1 0.008333
2 2 0.033333
答案 1 :(得分:1)
请注意,时间命名法的示例数据timestamp
列中存在不一致
timestampCol <- c('01-01-2017:09.00', '01-01-2017:10.00', '01-01-2017:09.00', '01-01-2017:14.00')
将字符串转换为时间值(考虑因素)
mydata$timestamp <- as.POSIXct(strptime(levels(mydata$timestamp)[mydata$timestamp], format="%m-%d-%Y:%H.%M"))
library(dplyr)
mydata %>%
group_by(id) %>%
mutate(diff = difftime(timestamp, lag(timestamp))) %>%
summarise(na.omit(diff))
# A tibble: 2 x 2
id `na.omit(diff)`
<fctr> <time>
1 1 1 hours
2 2 5 hours