如何对while循环进行矢量化?

时间:2016-10-31 12:38:44

标签: r dataframe while-loop vectorization

请考虑以下代码:

vectorize.me = function(history, row.idx=1, row.val=0, max=100){
  while (row.idx < max & row.val < max) {
    row.idx <- row.idx + 1
    entry <- paste('row.idx: ', row.idx, ' row.val: ', row.val)
    history[row.idx] <- entry
    print(entry)
  }
  return(history)
}

max <- 100
history <- vectorize.me(vector('list', max), max=max)

我想做以下事情:

  1. 我不想传递row.idxrow.val个参数,而是希望将数据框传递给vectorize.me函数,并让函数对每行idx和row val进行操作数据框。
  2. 删除while循环,只需在满足条件时停止迭代。
  3. 完成迭代后返回history列表。
  4. 我该怎么办?

    df <- data.frame(sample(0:100,1000,rep=TRUE))
    history <- vectorize.me(df, vector('list', max), max=max)
    

    编辑:这是一个完全人为的例子。我设计了它,因为我想要一些示例代码,它将值传递给下一个&#34;迭代&#34;在矢量化代码中(即应用,lapply,mapply等)

2 个答案:

答案 0 :(得分:1)

您可以对一系列零和1使用cumprod,以便在原始系列中遇到第一个零值时获得一个变为0的系列。这可用于限制history的长度和要打印的项目。

不是作为一个函数而只是简单的代码:

df <- data.frame(ids=seq(1,1000),val=sample(0:100,1000,rep=TRUE))
valmax<-80
pyn<-cumprod(df$val<valmax)
history<-paste("row.idx",df$ids[pyn>0],"row.val",df$val[pyn>0])
print(history)

您可能需要添加一些检查和条件才能将其转换为良好的代码,但原则上这可以解决问题

答案 1 :(得分:0)

以下内容如何:

vectorize.me <- function(df, var, history, max=100) {
  #-- Compute the max index in df to process (this is the "stopping condition" of the "loop")
  # Find the occurrence of the first index in df[,var] that is larger than 'max'
  # (note the fictitious FALSE and TRUE values added to the condition on df[,var]
  # in order to consider boundary conditions in one go)
  indmax <- min( which( c(FALSE, !df[,var] <= max, TRUE) ) ) - 2

  if (indmax > 0) { # There is at least one index to process
    # Limit indmax to the length of 'history'
    indmax <- min(indmax, length(history))
    ind <- 1:indmax
    entries <- paste('idx:', ind, 'val:', df[ind,var])
    history[ind] <- entries
    print(entries)
  }

  return(history)
}

#-- Test
# Test data
df <- data.frame(x=c(5, 8, 9, 8, 10, 4, 1, 3))

# Run tests
history <- vector('list', 8)
history <- vectorize.me(df, "x", history, max=8)   # first 'max' value is found in a middle row
history <- vectorize.me(df, "x", history, max=4)   # first value in data frame is larger than 'max'
history <- vectorize.me(df, "x", history, max=max(df[,"x"]))      # all values in data frame are <= 'max'
history <- vectorize.me(df, "x", history, max=max(df[,"x"]) + 1)  # 'max' is larger than the maximum value in df[,var]
history <- vector('list', 6)
history <- vectorize.me(df, "x", history, max=max(df[,"x"]))      # 'history' is shorter than the maximum index of df to process

注意:

  • 参数var指定数据框中应用max条件的列的名称。
  • 未对输入参数的有效性进行检查