向量嵌套嵌套

时间:2020-10-08 16:09:46

标签: r for-loop iteration vectorization

我想向量化这个嵌套的“ for”循环。 想要这个的两个原因:

  1. 运行起来会更快。
  2. 尽管该代码适用于示例数据,但是在我的真实数据上运行该代码时,它并不能完全起作用(当我期望得到负数或正数的结果时,它就可以正常工作;但是,当我期望零结果时,它就可以了)。我想先对其向量化,看看是否有帮助。我认为这可能会有所帮助的原因是,当代码在X中循环时,问题出在第二个循环中。

我已经在Google上待了两天,并在这里阅读了有关向量化循环的问题,但我仍然无法自己完成。从理论上讲,我可以看到拥有第二个循环可以引用的 n *数据框X的索引列表(而不是在每次迭代中创建数据框X)可能会解决我的问题,但我什至没有能够做到这一点,更不用说矢量化了。

简而言之,该函数采用一个输入数据的Excel文件,并使用另一个Excel文件-实际上是map / lookup 表-指定哪些单元格将输入数据放入第三个“计算器” Excel工作簿中。 (即,地图/查找表指定了在“计算器” Excel工作簿中的何处放置输入值)。在此示例中,“计算器”工作簿是在第一个代码块中创建的exampleworkbook.xlsx。

谢谢。

您需要设置一个目录,以便您可以保存示例“计算器” Excel工作簿(带有必要的公式),然后将其加载到:

## Set directory
workingdir <-"yourfilepath"
setwd(workingdir)

## Load packages
library(readxl) # for reading in Excel sheets
library("XLConnect") # needs Java v6 or higher
if (!require('openxlsx')) install.packages('openxlsx')
library(openxlsx) # to create Excel workbooks, no dependency on Java

## Create a blank workbook
wb <- createWorkbook() 

## Add two sheets to the workbook
addWorksheet(wb, "Sheet 1") 
addWorksheet(wb, "Sheet 2")

## Name column 1 in sheet 2
writeData(wb, "Sheet 2", "colsums", startCol = 1, startRow = 1)  

## Specify formulae to be used
v <- c("SUM('Sheet 1'!$A$1:$A$10)", "SUM('Sheet 1'!$B$1:$B$10)", 
       "SUM('Sheet 1'!$C$1:$C$10)", "SUM('Sheet 1'!$D$1:$D$10)")

## Write formulae into column 1 of sheet 2
writeFormula(wb, sheet = 2, x = v, startCol = 1, startRow = 2) 

## Save workbook to working directory
saveWorkbook(wb, "exampleworkbook.xlsx") 

现在是我要向量化的代码:
(注:此处,虚拟输入数据全为数字,以保持简单,但实际数据包含字符串和数字。)

## Load the workbook
exampleworkbook <- XLConnect::loadWorkbook("exampleworkbook.xlsx")
## Keep the formatting of the original document
setStyleAction(exampleworkbook,XLC$"STYLE_ACTION.NONE")  

## Create example data
inputdata <- data.frame(id = 1:3, var1 = c(5,4,2), var2 = c(25,11,9), 
                        var3 = c(8,5,11), var4 = c(1,2,3))
lookup <- data.frame(DestSheet = c(NA,1,1,1,1), 
                     DestCol = c(NA,2,1,4,3), 
                     DestRow = c(NA,1,2,3,4) ) 
row.names(lookup) = c("id","var1","var2","var3","var4")

## Write the function
getresult <- function(DF){

  output = data.frame(fix.empty.names = FALSE) # create an output df to hold the results
 
  for (a in 1:nrow(DF)) { # loop through rows of 'inputdata'
    X = DF[a,]  # create a vector, X, from row a
    X <- t(X)  # transpose X (to later allow it to be merged with 'lookup' df)
    ID = X["id",]
    print.default(a) # so can see which iteration is occurring
    ## Add row numbers to enable sorting back into original order after merge 
    ##   (because IDs are strings in the real thing and it's easier to
    ##   trouble-shoot if the variables in X are in the original order):
    X <- cbind(X, seq.int(nrow(X)) ) 
    X <- merge(X, lookup, by = "row.names") # gives destinations of each variable
    X <- X[!is.na(X$DestCol),]  # removes unnecessary data i.e. ID variable
    X <- setNames(X, c("Variable","Value","OrigRow","DestSheet","DestCol","DestRow")) 
    X <- X[order(X$OrigRow), ]
    #X <- X[X$Variable != "var5", ] # need to be able to remove variables if desired
    
    for(i in 1:nrow(X)){
      ## For loop extracts the value for each variable then 
      ##   writes it to the specified destination cell in the Excel worksheet
      b = X$Value[i]
      c = X$DestSheet[i]
      d = X$DestRow[i]
      e = X$DestCol[i]
      writeWorksheet(exampleworkbook, b, c, d, e, header = FALSE)
    }
    
    ## Read results from sheet 2, startRow = 1, startCol = 1, endRow = 6, endCol = 1; 
    ##   returns a data.frame
    results = readWorksheet(exampleworkbook,2,1,1,6,1) 
    results[is.na(results)] = 0
    results <- setNames(results, "Value")
    ## create a df containing results and their IDs 
    results <- data.frame(c(ID,results$Value[1],results$Value[2],
                            results$Value[3],results$Value[4]),
                          fix.empty.names = FALSE) 
    output <- rbind(output,t(results))
  }
  
  ## rename columns
  output <- setNames(output,c("ID", "sumColA", "sumColB", "sumColC", "sumColD"))  
return(output)

}

getresult(inputdata)  


0 个答案:

没有答案