R:创建函数或循环以根据两个(或更多)条件更改值

时间:2018-01-31 11:12:55

标签: r function if-statement

你可以从我的帐户时代看到,我是新来的。

基于2个或更多条件,我遇到了创建函数或循环以替换行中单个值的问题。这是我的样本数据集:

date timeslot volume lag1 1 2018-01-17 3 553 296 2 2018-01-17 4 NA 553 3 2018-01-18 1 NA NA 4 2018-01-18 2 NA NA 5 2018-01-18 3 NA NA 6 2018-01-18 4 NA NA

类型有:Date,int,num,num

我想创建一个函数,将lag1中的NA替换为最后5个simmulair时隙的平均值。该值的计算公式为:

w <- as.integer(mean(tail(data$volume[data$timeslot %in% c(1)],5), na.rm =TRUE ))

如果我创建iffor循环,则返回"the condition has length > 1 and only the first element will be used"

到目前为止,我只能更改所有lag1值或非。

该功能应该是这样的:如果lag1 == NA&amp; timeslot == 1然后将该行的值更改为w

到目前为止我尝试了什么:

for(i in data$lag1){
  if(data$timeslot== '1'){
    data$lag1[is.na(data$lag1)]<-w
  }else(data$lag1<-data$lag1)
}

还有:

data$lag1<- ifelse(data$timeslot== "1", is.na(data$lag1)<-w, data$lag1 ) 

这确实有效,但它会立即更改所有值。它应该只更改与时间段在同一行中的1值。

大部分时间它都会返回上面的错误。我怀疑它与“时间段”专栏有关。

我尝试了一些不同的东西,但看到我喜欢干净的R环境,其中大部分已被删除

我似乎无法想出这个。希望你们能指出我正确的方向。

1 个答案:

答案 0 :(得分:0)

概述

我创建了ReplaceNALag1WithSimilarRecentTimeslots()函数,将NA df$lag1值替换为每个唯一df$lag1值的最后5个df$timeslot值的平均值。

使用sapply()有助于使用ReplaceNALag1WithSimilarRecentTimeslots()一次,因为将逻辑应用于X中的每个元素。在这种情况下,X是唯一的向量df$timeslot个值,其行还包含NA df$lag1值。

由于可重现的数据不包含最近的非NA df$lag1值,因此引入了{p> NaN

# create data frame
df <- 
  data.frame(
    date = as.Date( x = c( 
      paste("2018"
            , "01"
            , rep( x = "17", times = 2 )
            , sep = "-" 
      )
      , paste( "2018"
               , "01"
               , rep( x = "18", times = 4 )
               , sep = "-" 
      )
    ) 
    )
    , timeslot = as.integer( c( 3, 4, 1, 2, 3, 4 ) )
    , volume = c( 533, rep( x = NA, times = 5 ) )
    , lag1 = c( 296, 553, rep( x = NA, times = 4 ) )
    , stringsAsFactors = FALSE
  )

# ensure that the data frame
# is ordered by date,
# so that rows with a date value closer to today
# appear at the end of the data frame
df <- df[ order( df$date ) , ]

# view results
df
#         date timeslot volume lag1
# 1 2018-01-17        3    533  296
# 2 2018-01-17        4     NA  553
# 3 2018-01-18        1     NA   NA
# 4 2018-01-18        2     NA   NA
# 5 2018-01-18        3     NA   NA
# 6 2018-01-18        4     NA   NA


# create a function that
# replaces NA lag1 values
# with the average of the
# last 5 lag1 values for 
# each unique timeslot value
ReplaceNALag1WithSimilarRecentTimeslots <- function( unique.timeslot.value ){

  # create condition that 
  # that pulls out non NAs from lag1 for a particular timeslot
  # but that only gives us the 5 most recent values
  # assuming that elements that appear at the end of vector
  # are more recent than elements that appear near the beginning of the vector
  non.na.lag1.condition.by.timeslot <- 
    tail(
      x = which( !is.na( df$lag1 ) & df$timeslot == unique.timeslot.value )
      , n = 5
    )

  # calculate the average lag1 value
  # for those similar non NA lag1 values
  # for that particular timeslot
  mean( df$lag1[ non.na.lag1.condition.by.timeslot ] ) 


} # end of ReplaceNALag1WithSimilarRecentTimeslots() function

# create the NA lag1 condition
na.lag1.condition <- which( is.na( df$lag1 ) )

# use ReplaceNALag1WithSimilarRecentTimeslots()
# on those NA lag1 values
df$lag1[ na.lag1.condition ] <-
  sapply( X = unique( df$timeslot[ na.lag1.condition ] )
          , FUN = function( i ) ReplaceNALag1WithSimilarRecentTimeslots( i )
          , simplify = TRUE
          , USE.NAMES = TRUE
          )

# View the results
df
#         date timeslot volume lag1
# 1 2018-01-17        3    533  296
# 2 2018-01-17        4     NA  553
# 3 2018-01-18        1     NA  NaN
# 4 2018-01-18        2     NA  NaN
# 5 2018-01-18        3     NA  296
# 6 2018-01-18        4     NA  553

# end of script #