Question

我的示例数据如下所示：

>         gros id nr_oriz
>      1:   23  1       1
>      2:   16  1       2
>      3:   14  1       3
>      4:   15  1       4
>      5:   22  1       5
>      6:   30  1       6
>      7:   25  2       1
>      8:   10  2       2
>      9:   13  2       3
>     10:   17  2       4
>     11:   45  2       5
>     12:   25  4       1
>     13:   15  4       2
>     14:   20  4       3
>     15:   20  4       4
>     16:   20  4       5

其中gros是每个土壤层的深度，id是剖面编号，nr_horiz是土壤层数。我需要创建两列：顶部和底部，其中top是地平线的上限，下限是下限。我们设法只使用以下方法获取最低值：

topsoil$bottom<-ave(topsoil$gros,topsoil$id,FUN=cumsum)

但是对于最高值，我们需要以某种方式偏移每个id的数据，并计算从0开始并且没有最后一个值的累积和，如下例所示：

    gros id nr_oriz top bottom
 1:   23  1       1   0     23
 2:   16  1       2  23     39
 3:   14  1       3  39     53
 4:   15  1       4  53     68
 5:   22  1       5  68     90
 6:   30  1       6  90    120
 7:   25  2       1   0     25
 8:   10  2       2  25     35
 9:   13  2       3  35     48
10:   17  2       4  48     65
11:   45  2       5  65    110
12:   25  4       1   0     25
13:   15  4       2  25     40
14:   20  4       3  40     60
15:   20  4       4  60     80
16:   20  4       5  80    100

是否有一个简单的解决方案，考虑到数据库非常大，我们无法手动执行（正如我们在此示例中使用top列所做的那样）。

Answer 1

您可以再次使用ave，但在“底部”列并使用自定义功能：

topsoil$top <- ave(topsoil$bottom, topsoil$id, FUN=function(x) c(0,x[-length(x)]))

看起来您正在使用data.table包，您可以修改代码以利用data.table的语法和性能。要计算bottom，您只需执行以下操作：

topsoil[, bottom := cumsum(gros), by = id]

然后计算top：

topsoil[, top := c(0L, bottom[-.N]), by = id]

或者你可以在一个步骤中将它们包装起来，类似于@akrun's answer所说明的。

Answer 2

您可以使用shift的开发版data.table执行此操作。安装devel版本的说明是here

library(data.table)#v1.9.5+
setDT(topsoil)[, c('top', 'bottom'):= {tmp <- cumsum(gros)
          list(top= shift(tmp, fill=0), bottom=tmp)}, by = id]
topsoil
#    gros id nr_oriz top bottom
# 1:   23  1       1   0     23
# 2:   16  1       2  23     39
# 3:   14  1       3  39     53
# 4:   15  1       4  53     68
# 5:   22  1       5  68     90
# 6:   30  1       6  90    120
# 7:   25  2       1   0     25
# 8:   10  2       2  25     35
# 9:   13  2       3  35     48
#10:   17  2       4  48     65
#11:   45  2       5  65    110
#12:   25  4       1   0     25
#13:   15  4       2  25     40
#14:   20  4       3  40     60
#15:   20  4       4  60     80
#16:   20  4       5  80    100

Answer 3

library(dplyr)
df %>% group_by(id) %>%
       mutate(bottom = cumsum(gros), top = lag(bottom)) %>%
       replace(is.na(.), 0)

计算每组的偏移累积和，以0开始

3 个答案: