Question

我有一个data.frames列表，其中包含化学过程每个阶段的数据。每个data.frames具有相同顺序的列数，但每个data.frames的行数可能不同。

见下面的示例数据，不同的是水果代表化学物质和试剂。

我编写了一个扩展原始数据并将数据添加到原始数据框中的列的函数。

我有两个问题，当我应用比例因子时，它只适用于最后一个data.frame的最后一个元素。然后将新的比例因子应用于整个最后的data.frame。我可以通过获取两个数据帧之间的常见水果（化学物质）的权重（总是在最后一行和第一行中）来生成下一个但最后一个数据帧的比例因子，并以与我们的方式类似的方式划分wts获得了第一个比例因子......然后在整个data.frame中相乘并重复到达第一个data.frame。另一个问题是......当使用lapply将scale_up函数应用于列表时，如何将这些比例因子提供给它，以便每个因子仅应用于其特定的数据帧。

example.data <- list(
  stage1 <- data.frame(code=c("aaa", "ooo", "bbb"),
                       stuff=c("Apples","Oranges","Bananas"),
                       Mw=c(1,2,3),
                       Density=c(3,2,1),
                       Assay=c(8,9,1),
                       Wt=c(1,2,3), stringsAsFactors = FALSE),
  stage2 <- data.frame(code=c("bbb","mmm","ccc","qqq","ggg"),
                       stuff=c("Bananas","Mango","Cherry","Quince","Gooseberry"),
                       Mw=c(8,9,10,1,2),
                       Density=c(23,32,55,5,4),
                       Assay=c(0.1,0.3,0.4,0.4,0.9),
                       Wt=c(45,23,56,99,2), stringsAsFactors = FALSE),
  stage3 <- data.frame(code=c("ggg","bbb","ggg","bbb"),
                       stuff=c("Gooseberry","Bread","Grapes","Butter"),
                       Mw=c(9,8,9,10),
                       Density=c(34,45,67,88),
                       Assay=c(10,10,46,52),
                       Wt=c(24,56,31,84), stringsAsFactors = FALSE)
)

scale_up <- function(inventory,scale_factor,vessel_volume_L, NoBatches = 1) {
  ## This function accepts a data.frame with Molecule, Mw, Density,
  ## Assay and Wt columns
  ## It takes a scale factor and vessel volume and returns input
  ## charges and fill volumes

  ## rownames(inventory) <- inventory$smiles
  inventory <- inventory[,-1] ## the rownames are given the smiles designation
  ## and the smiles column is removed

  ## volumes and moles are calculated for the given data

  inventory$Vol <- round((inventory$Wt / inventory$Density) , 3)
  inventory$Moles <- round((inventory$Wt / inventory$Mw) , 3)
  inventory$Equivs <- round((inventory$Moles / inventory$Moles[1]) , 3)

  inventory[,paste0(scale_factor,"xWt_kg")] <-  round((((inventory$Wt * scale_factor) / 1000 ) / NoBatches) , 3)
  inventory[,paste(scale_factor,"xVol_L",sep="")] <-  round((((inventory$Vol * scale_factor) / 1000 ) / NoBatches) , 3)

  inventory$PerCentFill <- round((100 * cumsum(inventory[,paste(scale_factor,"xVol_L",sep="")]) / vessel_volume_L) , 2)

  inventory
  ## at which point everything is in place to scale up

}

new.example.data  <- lapply(example.data, scale_up,20e3,454)

> new.example.data[[1]]
    stuff Mw Density Assay Wt   Vol Moles Equivs 20000xWt_kg 20000xVol_L PerCentFill
1  Apples  1       3     8  1 0.333     1      1          20        6.66        1.47
2 Oranges  2       2     9  2 1.000     1      1          40       20.00        5.87
3 Bananas  3       1     1  3 3.000     1      1          60       60.00       19.09

所以，我已经缩放了原始数据（实验室规模，克）以确定它是否适合10加仑的植物容器（454升），但唯一适当缩放的阶段是最后一个......另外两个需要那些'小提琴因素'，我需要在每个阶段应用'小提琴因素'，因为我循环（可能是for循环而不是lapply）通过列表。

（Ps ......我之前试过这个问题，但是我试图过多地掩饰我的例子，只是混淆了堆栈溢出）。

Answer 1

根据这篇文章中提到的细节以及其他链接Chaining dataframes in a list，这是我提出的解决方案：

在矩阵中提取第一个和最后一个水果的权重，如下所示：

wts<-sapply(example.data,function(t){c(t$Wt[1],t$Wt[nrow(t)])},simplify=T)

按照您最初的说法声明一个全局变量 final.wt ：

final.wt<<- 20000

创建一个scale函数来计算每个相应阶段的缩放因子：

scales<-function(x,final.wt){
n=ncol(x)
nscales<-numeric(n)
for(i in (n:1)){
  if(i==n){
  .GlobalEnv$final.wt = final.wt/x[2,i]
   nscales[i]=.GlobalEnv$final.wt
}else{
  .GlobalEnv$final.wt = .GlobalEnv$final.wt * x[1,i+1]/(x[2,i])
   nscales[i]=.GlobalEnv$final.wt
}
}
return(nscales)
}

这为您提供了每个阶段所需缩放因子的向量：

scale.fact<-scales(wts,final.wt)

现在你可以像这样使用mapply调用scale_up：

mapply(scale_up,example.data,scale.fact,454)

scale.fact中的值为：

42858.0 2857.2 238.1

每个值都将使用与舞台相对应的 mapply 传递给scale_factor。

沿列表中的数据框链接

1 个答案: