Question

我有一个像这样的数据集：

set.seed(71)
dat <- data.table(region = rep(c('A','B'), each=10),
    place = rep(c('C','D'), 10),
    start = sample.int(5, 20, replace = TRUE),
    end = sample.int(10, 20, replace = TRUE),
    count = sample.int(50, 20, replace = TRUE),
    para1 = rnorm(20,3,1),
    para2 = rnorm(20,4,1))

我想遍历此数据以有条件地生成具有以下列的另一个表：区域，位置，开始，结束，计数，计数0 在dat中，每一行可能有多于一行。在新表中，将从dat复制列区域，位置和开始的数据，并生成列结束，计数和计数0的数据。

以下是遍历dat每行的规则：

end = end +1
if (count=0) {
  count0=0
} else {
  count0=start*para1 + end*para2
}
if (count0>count) {
  count0=count
}
count = count -count0

我尝试使用for循环，if语句和mutate的组合，但无法正确处理。

我希望经过dat的前两行后会得到一个这样的表：

region  place   start   end       count         count0
     A      C       2     7  6.01673062    17.98326938
     A      C       2     8           0     6.01673062
     A      D       3     2  5.34392419     7.65607581
     A      D       3     3           0     5.34392419


the first two rows of dat I have are:
region  place   start   end count   para1         para2
     A      C       2     6    24   0.39412969  2.45643
     A      D       3     1    13   0.64372127  2.862456

Answer 1

编辑：这是一种懒惰的方法，它仍然应该非常快，但要临时制作一些我们最后要删除的行。我没有弄清楚每行要制作多少副本，而是每行制作了一堆副本，然后应用快速矢量化计算来获取更新的end，count和count0值，并删除我们不需要的行。

library(dplyr); library(tidyr)
output <-
  dat %>%
  mutate(orig_row = row_number()) %>%
  uncount(10) %>%   # I'm assuming here that 10 is enough columns
  group_by(orig_row) %>%
  mutate(row = row_number()) %>%
  mutate(
    end = end + row,
    count0 = pmin(count, start * para1 + end * para2), # Edit #2
    count = count - cumsum(count0)
  ) %>%
  filter(lag(count, default = 0) >= 0) %>%
  mutate(count = pmax(0, count),
         count0 = if_else(count == 0, lag(count), count0))
output


# A tibble: 4 x 10
# Groups:   orig_row [2]
  region place start   end count para1 para2 orig_row   row count0
  <chr>  <chr> <int> <int> <dbl> <dbl> <dbl>    <int> <int>  <dbl>
1 A      C         2     7  6.02 0.394  2.46        1     1  18.0 
2 A      C         2     8  0    0.394  2.46        1     2   6.02
3 A      D         3     2  5.34 0.644  2.86        2     1   7.66
4 A      D         3     3  0    0.644  2.86        2     2   5.34

初始答案：

我想这是在附近。

注意：我没有得到与您显示的相同的样本数据，也无法理解您提供的样本中的特定数字将如何产生建议的输出。例如，从dat的第一行显示（与我得到的不同），第一个count0应该是2*0.394 + 6*2.456 = 15.527，不是吗？

我在这里的方法是计算count0，然后找出适合其中的count，然后制作该行的这么多副本，将count减{{1 }}。

count0

顺便说一句，我的library(dplyr); library(tidyr) output <- dat %>% mutate(end = end + 1, orig_data = row_number(), count0 = if_else(count == 0, 0, start*para1 + end*para2), copies = 1 + count %/% count0) %>% uncount(copies) %>% group_by(orig_data) %>% mutate(row = row_number() - 1, count = count - row * count0)使用dat进行了不同的初始化。您能否确认您的数据是否按照OP中的规定进行了初始化？如果我们可以从同一个地方开始，就会更容易结盟。

set.seed(71)

在R中循环通过一个数据帧以有条件地生成另一个数据帧，其中第一个数据帧中的每一行都有一个或多个行

1 个答案: