创建包含总计达100行的数据框

时间:2019-01-17 17:39:07

标签: r

这是我的第一个刺路:

library(dplyr)

step_size <- 5

grid <- expand.grid(
    x1 = seq(0, 100, step_size)
    , x2 = seq(0, 100, step_size)
    , x3 = seq(0, 100, step_size)
)

grid$sum = grid$x1 + grid$x2 + grid$x3
grid$x1 <- (grid$x1 / grid$sum) * 100
grid$x2 <- (grid$x2 / grid$sum) * 100
grid$x3 <- (grid$x3 / grid$sum) * 100
grid$sum <- grid$x1 + grid$x2 + grid$x3

nrow(grid)

result <- distinct(grid) %>% filter(!is.na(sum))

head(result, 20)
nrow(result)

基本上,我想创建一个数据框,其中包含尽可能多的行,这些行总计100条,并且分布均匀。

R中有没有更简单更好的方法?谢谢!

2 个答案:

答案 0 :(得分:1)

使用data.table ...

library(data.table)

grid <- expand.grid(
  x1 = seq(0, 100)
  , x2 = seq(0, 100)
  , x3 = seq(0, 100)
)

setDT(grid)

res <- grid[grid[, rowSums(.SD) == 100], ]
res[, summation := rowSums(.SD)]

结果:

> res[, unique(summation)]
[1] 100

这也可以在base中完成,但是data.table更快:

library(data.table)

grid <- expand.grid(
  x1 = seq(0, 100)
  , x2 = seq(0, 100)
  , x3 = seq(0, 100)
)


grid2 <- expand.grid(
  x1 = seq(0, 100)
  , x2 = seq(0, 100)
  , x3 = seq(0, 100)
)

setDT(grid)

microbenchmark::microbenchmark(
  data.table = {        
    res <- grid[grid[, rowSums(.SD) == 100], ]
  },
  base = {
    res2 <- grid2[rowSums(grid2) == 100, ]
  }
)

Unit: milliseconds
       expr      min       lq     mean   median       uq      max neval cld
 data.table 59.41157  89.6700 109.0462 107.7415 124.2675 183.9730   100  a 
       base 65.70521 109.6471 154.1312 125.4238 156.9168 611.0169   100   b

答案 1 :(得分:1)

这是一个简单的功能。您可以指定所需的行数/列数,以及每一行求和的结果。

func <- function(cols = 3, rows = 10, rowTotal = 100) {
  dt1 <- replicate(n = cols, runif(n = rows))
  dt1 <- data.frame(apply(X = dt1, MARGIN = 2, FUN = function(x) x / rowSums(dt1) * rowTotal))
  return(dt1)
}

rowSums(func()) # default values (3 cols, 10 rows, each row sums to 100) 
rowSums(func(cols = 5, rows = 10, rowTotal = 50)) # 5 cols, 10 rows, row sums to 50)