你可以从我的帐户时代看到,我是新来的。
基于2个或更多条件,我遇到了创建函数或循环以替换行中单个值的问题。这是我的样本数据集:
date timeslot volume lag1
1 2018-01-17 3 553 296
2 2018-01-17 4 NA 553
3 2018-01-18 1 NA NA
4 2018-01-18 2 NA NA
5 2018-01-18 3 NA NA
6 2018-01-18 4 NA NA
类型有:Date,int,num,num
我想创建一个函数,将lag1中的NA替换为最后5个simmulair时隙的平均值。该值的计算公式为:
w <- as.integer(mean(tail(data$volume[data$timeslot %in% c(1)],5), na.rm =TRUE ))
如果我创建if
或for
循环,则返回"the condition has length > 1 and only the first element will be used"
到目前为止,我只能更改所有lag1值或非。
该功能应该是这样的:如果lag1 == NA&amp; timeslot == 1然后将该行的值更改为w
到目前为止我尝试了什么:
for(i in data$lag1){
if(data$timeslot== '1'){
data$lag1[is.na(data$lag1)]<-w
}else(data$lag1<-data$lag1)
}
还有:
data$lag1<- ifelse(data$timeslot== "1", is.na(data$lag1)<-w, data$lag1 )
这确实有效,但它会立即更改所有值。它应该只更改与时间段在同一行中的1值。
大部分时间它都会返回上面的错误。我怀疑它与“时间段”专栏有关。
我尝试了一些不同的东西,但看到我喜欢干净的R环境,其中大部分已被删除
我似乎无法想出这个。希望你们能指出我正确的方向。
答案 0 :(得分:0)
我创建了ReplaceNALag1WithSimilarRecentTimeslots()
函数,将NA df$lag1
值替换为每个唯一df$lag1
值的最后5个df$timeslot
值的平均值。
使用sapply()
有助于使用ReplaceNALag1WithSimilarRecentTimeslots()
一次,因为将逻辑应用于X中的每个元素。在这种情况下,X是唯一的向量df$timeslot
个值,其行还包含NA df$lag1
值。
df$lag1
值,因此引入了{p> NaN
。
# create data frame
df <-
data.frame(
date = as.Date( x = c(
paste("2018"
, "01"
, rep( x = "17", times = 2 )
, sep = "-"
)
, paste( "2018"
, "01"
, rep( x = "18", times = 4 )
, sep = "-"
)
)
)
, timeslot = as.integer( c( 3, 4, 1, 2, 3, 4 ) )
, volume = c( 533, rep( x = NA, times = 5 ) )
, lag1 = c( 296, 553, rep( x = NA, times = 4 ) )
, stringsAsFactors = FALSE
)
# ensure that the data frame
# is ordered by date,
# so that rows with a date value closer to today
# appear at the end of the data frame
df <- df[ order( df$date ) , ]
# view results
df
# date timeslot volume lag1
# 1 2018-01-17 3 533 296
# 2 2018-01-17 4 NA 553
# 3 2018-01-18 1 NA NA
# 4 2018-01-18 2 NA NA
# 5 2018-01-18 3 NA NA
# 6 2018-01-18 4 NA NA
# create a function that
# replaces NA lag1 values
# with the average of the
# last 5 lag1 values for
# each unique timeslot value
ReplaceNALag1WithSimilarRecentTimeslots <- function( unique.timeslot.value ){
# create condition that
# that pulls out non NAs from lag1 for a particular timeslot
# but that only gives us the 5 most recent values
# assuming that elements that appear at the end of vector
# are more recent than elements that appear near the beginning of the vector
non.na.lag1.condition.by.timeslot <-
tail(
x = which( !is.na( df$lag1 ) & df$timeslot == unique.timeslot.value )
, n = 5
)
# calculate the average lag1 value
# for those similar non NA lag1 values
# for that particular timeslot
mean( df$lag1[ non.na.lag1.condition.by.timeslot ] )
} # end of ReplaceNALag1WithSimilarRecentTimeslots() function
# create the NA lag1 condition
na.lag1.condition <- which( is.na( df$lag1 ) )
# use ReplaceNALag1WithSimilarRecentTimeslots()
# on those NA lag1 values
df$lag1[ na.lag1.condition ] <-
sapply( X = unique( df$timeslot[ na.lag1.condition ] )
, FUN = function( i ) ReplaceNALag1WithSimilarRecentTimeslots( i )
, simplify = TRUE
, USE.NAMES = TRUE
)
# View the results
df
# date timeslot volume lag1
# 1 2018-01-17 3 533 296
# 2 2018-01-17 4 NA 553
# 3 2018-01-18 1 NA NaN
# 4 2018-01-18 2 NA NaN
# 5 2018-01-18 3 NA 296
# 6 2018-01-18 4 NA 553
# end of script #