在列之间进行分组和分割

时间:2017-08-15 07:55:52

标签: r dataframe data.table

此问题是对Group to Group division

的跟进
          date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 3.000000
 8: 2/28/2013  40 3.636364
 9: 2/28/2013  50 4.166667
10: 2/28/2013  60 4.615385
11: 3/30/2013  10       NA
12: 3/30/2013  11 0.550000
13: 3/30/2013  12 0.400000
14: 3/30/2013  13 0.325000
15: 3/30/2013  15 0.300000

根据以下内容:

library(data.table)   # CRAN version 1.10.4 used
setDT(bb)[, D := bal / shift(bal, 6L)][seq(1L, nrow(bb), 5L), D := NA][]

现在我的问题是:

  1. 在每组的第4和第5位,答案应该打印100%,即9号,10号,14号和15号,依此类推,D下的值应为100%

  2. D中的值应为%

  3. 预期o / p

           date     bal    D
     1: 1/31/2013  10      NA
     2: 1/31/2013  11      NA
     3: 1/31/2013  12      NA
     4: 1/31/2013  13 100.00
     5: 1/31/2013  14 100.00
     6: 2/28/2013  20      NA
     7: 2/28/2013  30 300.00
     8: 2/28/2013  40 363.64
     9: 2/28/2013  50 100.00
    10: 2/28/2013  60 100.00
    11: 3/30/2013  10      NA
    12: 3/30/2013  11  55.00
    13: 3/30/2013  12  40.00
    14: 3/30/2013  13 100.00
    15: 3/30/2013  15 100.00
    

    这就是预期的产出。

1 个答案:

答案 0 :(得分:2)

假设我的previous answer中的条件相同,即每个日期的行数始终相同。通过这种观察,只需将bal的值滞后6行即可获得一个非常简单的解决方案。由于这首先忽略了组,因此有必要将结果D设置为每个组中第一行的NA,即最后每隔5行。

需要使用1.0手动覆盖特定行的附加请求(打印为100%)同样通过计算相应的索引来处理。

library(data.table)
setDT(bb)[, D := formattable::percent(bal / shift(bal, 6L))][seq(1L, .N, 5L), D := NA][
  rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 1.0][]
         date bal       D
 1: 1/31/2013  10      NA
 2: 1/31/2013  11      NA
 3: 1/31/2013  12      NA
 4: 1/31/2013  13 100.00%
 5: 1/31/2013  14 100.00%
 6: 2/28/2013  20      NA
 7: 2/28/2013  30 300.00%
 8: 2/28/2013  40 363.64%
 9: 2/28/2013  50 100.00%
10: 2/28/2013  60 100.00%
11: 3/30/2013  10      NA
12: 3/30/2013  11  55.00%
13: 3/30/2013  12  40.00%
14: 3/30/2013  13 100.00%
15: 3/30/2013  15 100.00%

请注意,percent包中使用了formattable函数。这样做的优点是值仍然是数字,可用于计算但以百分比形式打印。

根据OP的要求,这里也是一个不使用formattable::percent()的版本:

setDT(bb)[, D := 100.0 * bal / shift(bal, 6L)][seq(1L, .N, 5L), D := NA][
  rep(seq(4L, nrow(bb), 5L), each = 2L) + 0:1, D := 100.0][]
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13 100.0000
 5: 1/31/2013  14 100.0000
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 100.0000
10: 2/28/2013  60 100.0000
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13 100.0000
15: 3/30/2013  15 100.0000

OP要求拥有一个动态版本,用户可以选择每个组中的哪些行为100.我试图制作一个完整的灵活版本,其中每个组中的元素数量也是动态的(仍需要在所有组中都是相同的)并将其打包为函数:

divide_by_group <- function(DF, 
                            id_of_rows_in_group_to_override = NA, 
                            val_override = 100.0) {
  library(data.table)
  # check parameters
  checkmate::assert_data_frame(DF)
  checkmate::assert_names(c("date", "bal"), subset.of = names(DF))
  checkmate::assert_number(val_override)
  # retrieve group length, verify all groups have the same length
  l_grp <- setDT(DF)[, .N, by = date][
    , if (any(N != first(N))) stop("Differing group lengths") else first(N)]
  # verify user specified row ids
  checkmate::assert_integerish(id_of_rows_in_group_to_override, lower = 1L, upper = l_grp)
  # compute result
  result <- DF[, D := 100.0 * bal / shift(bal, l_grp + 1L)][seq(1L, .N, l_grp), D := NA]
  # apply override
  # compute rows
  rn <- c(outer(id_of_rows_in_group_to_override, seq(l_grp, nrow(DF) - l_grp, 5L), `+`))
  # verify rn is in range 
  checkmate::assert_integerish(rn, lower = l_grp + 1L, upper = nrow(DF))
  result[rn, D := val_override]
  return(result[])
}

请注意,超过50%的代码用于检查参数和假设。

示例调用

divide_by_group(bb)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 416.6667
10: 2/28/2013  60 461.5385
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13  32.5000
15: 3/30/2013  15  30.0000
divide_by_group(bb, 4:5)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30 300.0000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 100.0000
10: 2/28/2013  60 100.0000
11: 3/30/2013  10       NA
12: 3/30/2013  11  55.0000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13 100.0000
15: 3/30/2013  15 100.0000
divide_by_group(bb, c(2, 5), -9.9)
         date bal        D
 1: 1/31/2013  10       NA
 2: 1/31/2013  11       NA
 3: 1/31/2013  12       NA
 4: 1/31/2013  13       NA
 5: 1/31/2013  14       NA
 6: 2/28/2013  20       NA
 7: 2/28/2013  30  -9.9000
 8: 2/28/2013  40 363.6364
 9: 2/28/2013  50 416.6667
10: 2/28/2013  60  -9.9000
11: 3/30/2013  10       NA
12: 3/30/2013  11  -9.9000
13: 3/30/2013  12  40.0000
14: 3/30/2013  13  32.5000
15: 3/30/2013  15  -9.9000