在满足阈值时使用循环编写函数

时间:2015-03-31 07:29:06

标签: r

我有一个数据框(数据),由639个数据组成,有6列。每个单元表示以秒为单位的时间我计算了每列的阈值。

到目前为止,我已经这样做了:每列的计算阈值。因此6列的6个阈值

threshold1
[1] 16 22 31  6 11 13

threshold2
[1] 200.0 275.0 387.5  75.0 137.5 162.5

此阈值表示每列的最小和最大秒数。所以我想(对所有列执行此操作):在第1列中突出显示值低于16秒的所有单元格以及值大于200秒的所有单元格。

我已经通过以下方式做到了这一点:

column1<-ifelse(data$column1<threshold1[1],"speeder",     
         ifelse(data$column1>threshold2[1], "slower",1))


column2<-ifelse(data$column2<threshold1[2],"speeder",     
         ifelse(data$column2>threshold2[2], "slower",1))

column3<-ifelse(data$column3<threshold1[3],"speeder",     
         ifelse(data$column3>threshold2[3], "slower",1))
所有6列的

等等。

现在我想在循环中编写它,所以我不需要每次都手动编写函数ifelse,因为我有不同的数据集,包含不同数量的列。

2 个答案:

答案 0 :(得分:1)

首先生成名为&#34; dat&#34;:

的数据
dat <- data.frame(
    column1 = runif(n = 638, min=0, max=220),
    column2 = runif(n = 638, min=0, max=300),
    column3 = runif(n = 638, min=0, max=400),
    column4 = runif(n = 638, min=0, max=100),
    column5 = runif(n = 638, min=0, max=150),
    column6 = runif(n = 638, min=0, max=200))

# define thresholds    
threshold1 <- c(16, 22, 31, 6, 11, 13)
threshold2 <- c(200.0, 275.0, 387.5, 75.0, 137.5, 162.5)

使用循环

# Declare a list that will contain the results
results <- list()

# Loop over the columns
for(i in seq_len(ncol(dat))) {
    results[[colnames(dat)[i]]] <- ifelse(dat[,i] < threshold1[i],
                                          yes = "speeder", 
                                          no = ifelse(dat[,i] > threshold2[i], 
                                                      yes = "slower", no = 1))
}

使用lapply

您也可以使用lapply而不是循环,如下所示:

results <- lapply(1:ncol(dat), function(x) {
    ifelse(dat[,x] < threshold1[x],
           yes = "speeder", 
           no = ifelse(dat[,x] > threshold2[x],
                       yes = "slower", no = 1))
})

names(results) <- colnames(dat)

<强>结果

您可以使用results[[1]]results[[6]]results$column1results$column6

访问结果
> head(results$column1, 100)

  [1] "1"       "1"       "1"       "1"       "1"       "1"       "slower" 
  [8] "1"       "slower"  "1"       "1"       "1"       "speeder" "1"      
 [15] "1"       "1"       "1"       "1"       "1"       "1"       "1"      
 [22] "slower"  "1"       "1"       "1"       "1"       "1"       "1"      
 [29] "1"       "1"       "1"       "slower"  "1"       "slower"  "slower" 
 [36] "1"       "1"       "1"       "1"       "speeder" "1"       "1"      
 [43] "1"       "1"       "speeder" "speeder" "1"       "1"       "slower" 
 [50] "1"       "1"       "slower"  "1"       "1"       "1"       "1"      
 [57] "1"       "1"       "1"       "1"       "1"       "1"       "1"      
 [64] "1"       "1"       "1"       "1"       "slower"  "1"       "1"      
 [71] "slower"  "1"       "1"       "1"       "speeder" "1"       "1"      
 [78] "1"       "1"       "1"       "1"       "slower"  "1"       "1"      
 [85] "1"       "1"       "1"       "1"       "1"       "1"       "1"      
 [92] "1"       "1"       "1"       "1"       "1"       "1"       "1"      
 [99] "speeder" "1" 

答案 1 :(得分:0)

你也可以试试lapply ..它会比循环更快..

dat <- data.frame(
  column1 = runif(n = 638, min=0, max=220),
  column2 = runif(n = 638, min=0, max=300),
  column3 = runif(n = 638, min=0, max=400),
  column4 = runif(n = 638, min=0, max=100),
  column5 = runif(n = 638, min=0, max=150),
  column6 = runif(n = 638, min=0, max=200))

# define thresholds    
threshold1 <- c(16, 22, 31, 6, 11, 13)
threshold2 <- c(200.0, 275.0, 387.5, 75.0, 137.5, 162.5)

result = matrix(unlist(lapply(seq(6), function(i){
  ifelse(dat[,i] < threshold1[i],
         yes = "speeder", 
         no = ifelse(dat[,i] > threshold2[i], 
                     yes = "slower", no = 1))
})), ncol = 6, byrow = FALSE)

head(result)
     [,1]      [,2] [,3] [,4]     [,5] [,6]
[1,] "speeder" "1"  "1"  "slower" "1"  "1" 
[2,] "1"       "1"  "1"  "1"      "1"  "1" 
[3,] "1"       "1"  "1"  "1"      "1"  "1" 
[4,] "1"       "1"  "1"  "slower" "1"  "1" 
[5,] "1"       "1"  "1"  "1"      "1"  "1" 
[6,] "1"       "1"  "1"  "slower" "1"  "1"