Question

我的数据如下所示（我在数据集中有更多变量和组）：

group   x   time
1   0   1636
1   0   1637
1   0   1638
1   1   1639
1   1   1640
1   1   1641
1   1   1642
2   0   1683
2   0   1684
2   0   1685
2   0   1686
2   0   1687
2   0   1688
2   1   1689
2   1   1690
2   1   1691
3   0   1638
3   1   1639
3   1   1640

每个group都有自己的时间序列（由time表示）。我需要的是x在特定组中取值1之前和之后的固定数量的观察值。例如，在x取值为1之前的1次和3次观察之前，总是进行3次观察（因此3次观察之前和之后3次观察）。如果在之前或之后没有足够的观察结果，我想删除该组的时间序列。

数据看起来如下：

group   x   time
1   0   1636
1   0   1637
1   0   1638
1   1   1639
1   1   1640
1   1   1641
2   0   1686
2   0   1687
2   0   1688
2   1   1689
2   1   1690
2   1   1691

有关如何执行此操作的任何建议？

Answer 1

我们按group_indices()和group创建一个唯一的x，然后我们会过滤少于3次观察和row_number()次观察的群组x != 1 %in%范围n()（组大小）到n()-2 x只保留更改library(dplyr) df %>% mutate(g = group_indices_(., .dots = c("group", "x"))) %>% group_by(g) %>% mutate(condition = ifelse(x == 1, NA, row_number())) %>% filter(n() >= 3, ifelse(is.na(condition), TRUE, condition %in% n():(n()-2)))之前的3个观察值。

#Source: local data frame [13 x 5]
#Groups: g [4]
#
#   group     x  time     g condition
#   <int> <int> <int> <int>     <int>
#1      1     0  1636     1         1
#2      1     0  1637     1         2
#3      1     0  1638     1         3
#4      1     1  1639     2        NA
#5      1     1  1640     2        NA
#6      1     1  1641     2        NA
#7      1     1  1642     2        NA
#8      2     0  1686     3         4
#9      2     0  1687     3         5
#10     2     0  1688     3         6
#11     2     1  1689     4        NA
#12     2     1  1690     4        NA
#13     2     1  1691     4        NA

给出了：

您可以选择通过将condition添加到链中来删除select(-(g:condition))和df <- structure(list(group = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L), x = c(0L, 0L, 0L, 1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 0L, 1L, 1L), time = c(1636L, 1637L, 1638L, 1639L, 1640L, 1641L, 1642L, 1683L, 1684L, 1685L, 1686L, 1687L, 1688L, 1689L, 1690L, 1691L, 1638L, 1639L, 1640L)), .Names = c("group", "x", "time"), class = "data.frame", row.names = c(NA, -19L))列。

数据

categories.stream() .map(this::getCategories) .filter(Objects::nonNull) .flatMap(List::stream) .collect(Collectors.toList())

提取多个时间序列

1 个答案: