每对的特殊分组编号

时间:2018-02-21 18:52:01

标签: r dplyr aggregate

这里已经回答了部分问题special-group-number-for-each-combination-of-data。在大多数情况下,我们在数据中包含对和其他数据值。我们想要实现的是,如果存在这些对,则对这些组进行编号,并将它们编号直到下一对。

我集中了每个对,例如c("bad","good")想要对它们进行分组,对于c('Veni',"vidi","Vici")对分配唯一的数字666

以下是示例数据

names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")

  df <- data.frame(names)

以下是实际和一般情况预期输出

     names  Group
1      bad    1
2     good    1
3        1    1
4        2    1
5     good    2
6      bad    2
7      111    2
8      bad    3
9  J.James    3
10    good    4
11 J.James    4
12     333    4
13 J.James    5
14    good    5
15     761    5
16    Veni    666
17    vidi    666
18    Vici    666

1 个答案:

答案 0 :(得分:1)

以下两种方法可以复制给定样本数据集的OP预期结果。

两者都以同样的方式工作。首先,所有&#34;令人不安的&#34;行,即不包含&#34;有效&#34;的行。名称,被跳过,行有&#34;有效&#34;名称只是以2为一组编号。其次,具有豁免名称的行将被赋予特殊的组编号。最后,通过向前进行最后一次观察来填充NA行。

data.table

library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt, 
                  grp := rep(1:.N, each = 2L, length.out = .N)][
                    names %in% exempt, grp := 666L][
                      , grp := zoo::na.locf(grp)][]
      names grp
 1:     bad   1
 2:    good   1
 3:       1   1
 4:       2   1
 5:    good   2
 6:     bad   2
 7:     111   2
 8:     bad   3
 9: J.James   3
10:    good   4
11: J.James   4
12:     333   4
13: J.James   5
14:    good   5
15:     761   5
16:    Veni 666
17:    vidi 666
18:    Vici 666

dplyr / tidyr

我尝试提供dplyr / tidyr解决方案:

library(dplyr)
as_tibble(names) %>% 
  mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,  
                       rep(1:n(), each = 2L, length.out = n()),
                       if_else(names %in% exempt, 666L, NA_integer_))) %>% 
  tidyr::fill(grp)
# A tibble: 18 x 2
   value     grp
   <chr>   <int>
 1 bad         1
 2 good        1
 3 1           1
 4 2           1
 5 good        3
 6 bad         3
 7 111         3
 8 bad         4
 9 J.James     5
10 good        5
11 J.James     6
12 333         6
13 J.James     7
14 good        7
15 761         7
16 Veni      666
17 vidi      666
18 Vici      666