我只是想知道是否有人可以告诉我如何使用r来进行以下计算?
我有一个包含3列的一年一小时数据集,“date
”“time
”和“values
”
例如:
'01/01/2000' '08:00' '10'
'01/01/2000' '09:00' '30'
'01/01/2000' '10:00' '43'
'01/01/2000' '11:00' '55'
'01/01/2000' '12:00' '59'
'01/01/2000' '13:00' '45'
'01/01/2000' '14:00' '10'
'01/01/2000' '15:00' '15'
'01/01/2000' '16:00' '43'
'01/01/2000' '17:00' '45'
'01/01/2000' '18:00' '60'
'01/01/2000' '19:00' '10'
我想创建一个data.frame,用于计算值大于>的剧集的长度。 40,如果可能的话,用开始日期和时间表示,例如从上表中第一次出现超时是在上午10:00持续4小时,第二次出现是在16:00持续3小时,所以我想知道是否可以创建如下数据框?
'date' 'time' 'Duration'
'01/01/2000' '10:00' '4'
'01/01/2000' '16:00' '3'
等年度数据集
答案 0 :(得分:4)
这是另一个依赖于plyr
的解决方案:
它可以更容易地计算其他数量
在每个40以上的值的法术上,例如,平均值或最大值。
# Sample data
k <- 3
d <- data.frame(
date = rep( seq.Date( Sys.Date(), length=k, by="day" ), each=24 ),
time = sprintf( "%02d:00", rep( 0:23, k ) ),
value = round(200*runif(24*k))
)
d$timestamp <- as.POSIXct( paste( d$date, d$time ) )
d <- d[ order( d$timestamp ), ]
# Extract the spells above 40
n <- nrow(d)
d$inside <- d$value > 40
d$start <- ! c(FALSE, d$inside[-n]) & d$inside
d$end <- d$inside & ! c(d$inside[-1], FALSE) # Not used
d$group <- cumsum(d$start) # Number the spells
d <- d[ d$inside, ]
library(plyr)
ddply( d, "group", summarize,
start = min(timestamp),
end = max(timestamp),
length = length(value),
mean = mean(value)
)
超过40的值的法术可以持续数天: 这可能是也可能不是你想要的。
答案 1 :(得分:3)
另一种选择:
dat <- structure(list(date = c("01/01/2000", "01/01/2000", "01/01/2000",
"01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000",
"01/01/2000", "01/01/2000", "01/01/2000", "01/01/2000"),
time = c("08:00", "09:00", "10:00", "11:00", "12:00", "13:00", "14:00",
"15:00", "16:00", "17:00", "18:00", "19:00"), value = c("10", "30", "43",
"55", "59", "45", "10", "15", "43", "45", "60", "10")),
.Names = c("date", "time", "values"), row.names = c(NA, -12L),
class = "data.frame")
run <- rle(dat$value > 40)
dat$exceeds <- rep(run$values, run$lengths)
dat$duration <- rep(run$lengths, run$lengths)
starts <- dat[head(c(1, cumsum(run$length) + 1), length(run$length)),]
result <- subset(starts, duration > 1 & exceeds)
result[, c(1, 2, 5)]
date time duration
3 01/01/2000 10:00 4
9 01/01/2000 16:00 3
答案 2 :(得分:2)
一些数据
txt <- "'01/01/2000' '08:00' '10'
'01/01/2000' '09:00' '30'
'01/01/2000' '10:00' '43'
'01/01/2000' '11:00' '55'
'01/01/2000' '12:00' '59'
'01/01/2000' '13:00' '45'
'01/01/2000' '14:00' '10'
'01/01/2000' '15:00' '15'
'01/01/2000' '16:00' '43'
'01/01/2000' '17:00' '45'
'01/01/2000' '18:00' '60'
'01/01/2000' '19:00' '10'"
tc <- textConnection(txt)
data <- read.table(tc,header=FALSE,as.is=TRUE)
功能
fun <- function(data,cutoff=40){
data_above <- 1L*(data$V3>cutoff)
id_start <- which(diff(c(0L,data_above))==1)
id_end <- which(diff(c(data_above,0L))== -1)
res <- cbind(data[id_start,1:2],Duration=id_end-id_start+1)
return(res)
}
结果
fun(data)
V1 V2 Duration
3 01/01/2000 10:00 4
9 01/01/2000 16:00 3