这里已经回答了部分问题special-group-number-for-each-combination-of-data。在大多数情况下,我们在数据中包含对和其他数据值。我们想要实现的是,如果存在这些对,则对这些组进行编号,并将它们编号直到下一对。
我集中了每个对,例如c("bad","good")
想要对它们进行分组,对于c('Veni',"vidi","Vici")
对分配唯一的数字666
。
以下是示例数据
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
df <- data.frame(names)
以下是实际和一般情况预期输出
names Group
1 bad 1
2 good 1
3 1 1
4 2 1
5 good 2
6 bad 2
7 111 2
8 bad 3
9 J.James 3
10 good 4
11 J.James 4
12 333 4
13 J.James 5
14 good 5
15 761 5
16 Veni 666
17 vidi 666
18 Vici 666
答案 0 :(得分:1)
以下两种方法可以复制给定样本数据集的OP预期结果。
两者都以同样的方式工作。首先,所有&#34;令人不安的&#34;行,即不包含&#34;有效&#34;的行。名称,被跳过,行有&#34;有效&#34;名称只是以2为一组编号。其次,具有豁免名称的行将被赋予特殊的组编号。最后,通过向前进行最后一次观察来填充NA
行。
data.table
library(data.table)
names <- c(c("bad","good"),1,2,c("good","bad"),111,c("bad","J.James"),c("good","J.James"),333,c("J.James","good"),761,'Veni',"vidi","Vici")
exempt <- c("Veni", "vidi", "Vici")
data.table(names)[is.na(as.numeric(names)) & !names %in% exempt,
grp := rep(1:.N, each = 2L, length.out = .N)][
names %in% exempt, grp := 666L][
, grp := zoo::na.locf(grp)][]
names grp 1: bad 1 2: good 1 3: 1 1 4: 2 1 5: good 2 6: bad 2 7: 111 2 8: bad 3 9: J.James 3 10: good 4 11: J.James 4 12: 333 4 13: J.James 5 14: good 5 15: 761 5 16: Veni 666 17: vidi 666 18: Vici 666
dplyr
/ tidyr
我尝试提供dplyr
/ tidyr
解决方案:
library(dplyr)
as_tibble(names) %>%
mutate(grp = if_else(is.na(as.numeric(names)) & !names %in% exempt,
rep(1:n(), each = 2L, length.out = n()),
if_else(names %in% exempt, 666L, NA_integer_))) %>%
tidyr::fill(grp)
# A tibble: 18 x 2 value grp <chr> <int> 1 bad 1 2 good 1 3 1 1 4 2 1 5 good 3 6 bad 3 7 111 3 8 bad 4 9 J.James 5 10 good 5 11 J.James 6 12 333 6 13 J.James 7 14 good 7 15 761 7 16 Veni 666 17 vidi 666 18 Vici 666