将循环输出从列表转换为R中的数据帧

时间:2017-05-05 03:45:27

标签: r

通过简化的代码来问这个问题(逻辑有点不可思议 - 但它与我的情况相似),我目前使用的代码很长,可能是没有价值的太多单词。我很乐意添加回答这个问题所需的内容:

我有一个for循环的情况,例如:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16),
                "Vanilla" = c(0.64), "Blueberry" = c(.75))

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, prob = c(1-data2[i],data2[i])))  

    lossCol <- freqSim*(runif(n=100, min=0, max=7000))

    costAvg <- mean(as.numeric(unlist(lossCol)))
    costSD <- sd(as.numeric(unlist(lossCol)))

    costAvg <- formatC(costAvg, format='d', big.mark=",")
    costSD <- formatC(costSD, format='d', big.mark= ",")

    stats <- list()
    stats[[i]] <- list(costAvg,costSD)

    print(stats[[i]])
}

我返回了一个矢量,如:

[[1]] 
[1] "1,261" 

[[2]] 
[1] "2,103"

[[1]] 
[1] "313"

[[2]] 
[1] "1,165"

[[1]] 
[1] "2,073"

[[2]] 
[1] "2,206"

[[1]] 
[1] "2,417"

[[2]] 
[1] "2,258"

我理想地喜欢一个看起来像的矩阵:

          Chocolate    Strawberry   Vanilla   Blueberry
Label A     1,261       313          2,073      2,417  
Label B     2,103       1,165        2,206      2,258

任何方式做到这一点,而不是把自己从悬崖上扔下来?

5 个答案:

答案 0 :(得分:1)

我们可以使用simplify2array

来完成此操作
res <- simplify2array(stats)
dimnames(res) <- list(paste("Label", c("A", "B")), names(data2))

注意:确保定义

stats <- list() 

for循环之外

更好的选择是分配预先分配的length

stats <- vector("list", length(data2))

答案 1 :(得分:1)

要获得您作为输出表的确切内容,请尝试此操作。没有时间应用正确的命名约定。请原谅。

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16),
                    "Vanilla" = c(0.64), "Blueberry" = c(.75))
x = c("Chocolate", "Strawberry", "Vanilla", "Blueberry")
y = c("Label A", "Label B")

data3 = matrix(nrow = 2, ncol = 4)
colnames(data3) = x
row.names(data3) = y

for (i in 1:4) { 
  freqSim <- data.frame(sample(0:1, length(1:100), replace = T, prob = c(1-data2[i],data2[i])))  

  lossCol <- freqSim*(runif(n=100, min=0, max=7000))

  costAvg <- mean(as.numeric(unlist(lossCol)))
  costSD <- sd(as.numeric(unlist(lossCol)))

  costAvg <- formatC(costAvg, format='d', big.mark=",")
  costSD <- formatC(costSD, format='d', big.mark= ",")

  data3[1, i] = costAvg
  data3[2, i] = costSD
}

答案 2 :(得分:1)

这是一个简单的修复:

data2 <- data.frame("Chocolate" = c(0.25), "Strawberry" = c(.16),
      "Vanilla" = c(0.64), "Blueberry" = c(.75))

stats <- data.frame( row.names = c("Label A", "Label B") )

for (i in 1:4) { 
    freqSim <- data.frame(sample(0:1, length(1:100), replace=T, 
            prob = c(1-data2[i],data2[i])))  

    lossCol <- freqSim*(runif(n=100, min=0, max=7000))

    costAvg <- mean(as.numeric(unlist(lossCol)))
    costSD <- sd(as.numeric(unlist(lossCol)))

    costAvg <- formatC(costAvg, format='d', big.mark=",")
    costSD <- formatC(costSD, format='d', big.mark= ",")

    stats["Label A", i] <- costAvg
    stats["Label B", i] <- costSD
}

colnames(stats) <- colnames(data2)

结果:

        Chocolate Strawberry Vanilla Blueberry
Label A       764        470   2,003     2,932
Label B     1,674      1,418   2,202     2,315

如果可以,我建议您考虑使用tidyr进行这些操作,而不是在基础R中进行操作。

答案 3 :(得分:1)

以下是dplyr的示例。它不会给你你想要的矩阵,但它是避免for循环的一种更简洁的方法:

freqSim <- lapply(names(data2), function(x)
                  sample(0:1, length(1:100), replace=T, 
                  prob=c(1-data2[x], data2[x])))
names(freqSim) <- names(data2)

lossCol <- lapply(freqSim, function(x) x*(runif(n=100, min=0, max=7000))) 

do.call(data.frame, lossCol) %>% 
    gather(type, val) %>% 
    group_by(type) %>% 
    summarise(mean=mean(val), sd=sd(val)) %>% 
    mutate_at(.cols=vars(mean, sd), .funs = funs(format(., format="d", big.mark=","))) 

# A tibble: 4 × 3
        type       mean        sd
       <chr>      <chr>     <chr>
1  Blueberry 2,911.8587 2,481.310
2  Chocolate   810.6141 1,820.357
3 Strawberry   680.2027 1,659.491
4    Vanilla 2,302.0011 2,305.148

答案 4 :(得分:1)

如果您真的想要矩阵格式输出,可以使用outer在基数R中执行此操作。例如,要计算mtcars每列的均值和中位数,您可以执行以下操作:

> outer(list(mean=mean, median=median), as.data.frame(mtcars), Vectorize(function(f,y) f(y)))
             mpg    cyl       disp       hp      drat      wt     qsec     vs      am   gear   carb
mean   20.090625 6.1875 230.721875 146.6875 3.5965625 3.21725 17.84875 0.4375 0.40625 3.6875 2.8125
median 19.200000 6.0000 196.300000 123.0000 3.6950000 3.32500 17.71000 0.0000 0.00000 4.0000 2.0000

outer的第一个参数是要应用的函数的命名列表,第二个是要迭代的列,最后一个参数是用于计算列上函数的函数。这里需要Vectorize

在您的情况下,我会将您的代码分为三个部分:

生成样本:

>     freqSim <- lapply(data2, function(x) sample(0:1, length(1:100), replace=T, prob=c(1-x,x)) *(runif(n=100, min=0, max=7000))) 

看起来像这样:

> str(freqSim)
List of 4
 $ Chocolate : num [1:100] 0 0 0 0 0 ...
 $ Strawberry: num [1:100] 0 0 0 0 0 0 0 0 0 0 ...
 $ Vanilla   : num [1:100] 4175 1456 0 1201 852 ...
 $ Blueberry : num [1:100] 0 3896 3794 5096 2901 ...

声明你的功能:

> funs <- list(`Label A`=function(x) formatC(mean(x), format='d', big.mark=","), 
               `Label B`=function(x) formatC(sd(x), format='d', big.mark=",") )

使用outer

> outer(funs, freqSim, Vectorize(function(f,y) f(y)))
        Chocolate Strawberry Vanilla Blueberry
Label A "518"     "427"      "2,044" "2,441"  
Label B "1,417"   "1,290"    "2,250" "2,259"