我有一个示例数据框:
x <- data.frame(x = c(1, 1, 2, 2, 3, 3, 4, 4, 1),
y = c("a", "a", "b", "b", "c", "c", "d", "d", "z"))
我可以轻松地按小组获取row_number()
:
x %>%
group_by(x) %>%
mutate(id = row_number())
x y id
1 a 1
1 a 2
2 b 1
2 b 2
3 c 1
3 c 2
4 d 1
4 d 2
1 z 3
我想要的是将x$x
和x$y
的相同组合编号为相同的数字,例如
x y id
1 a 1
1 a 1
2 b 1
2 b 1
3 c 1
3 c 1
4 d 1
4 d 1
1 z 2
以便c(x$x[1], x&y[1]) == c(x$x[2], x$y[2]) == c(x$x[n], x$y[n]
在新列中获得相同的值。
我如何在dplyr
中执行此操作?
答案 0 :(得分:1)
另一种可能的选择:
library(dplyr)
x <- data_frame(x = c(1, 1, 2, 2, 3, 3, 4, 4, 1),
y = c("a", "a", "b", "b", "c", "c", "d", "d", "z"))
x %>%
group_by(x,y) %>%
summarise(y_list = list(y)) %>%
group_by(x) %>%
mutate(id = row_number()) %>%
tidyr::unnest() %>%
select(-y_list)
#output
x y id
<dbl> <chr> <int>
1 1 a 1
2 1 a 1
3 1 z 2
4 2 b 1
5 2 b 1
6 3 c 1
7 3 c 1
8 4 d 1
9 4 d 1
答案 1 :(得分:1)
以下是使用factor
的另一种解决方案:
## levels=unique(y) is so that levels of y are numbered according to their order of appearance and not alphabetical order
df %>% group_by(x) %>% mutate(id=as.numeric(factor(y,levels=unique(y))))
它返回:
x y id
<dbl> <chr> <dbl>
1 1 a 1
2 1 a 1
3 2 b 1
4 2 b 1
5 3 c 1
6 3 c 1
7 4 d 1
8 4 d 1
9 1 z 2
答案 2 :(得分:1)
我们可以使用的另一个选项是match
library(dplyr)
x %>%
group_by(x) %>%
mutate(id = match(y, unique(y)))
# A tibble: 9 x 3
# Groups: x [4]
# x y id
# <dbl> <fctr> <int>
#1 1 a 1
#2 1 a 1
#3 2 b 1
#4 2 b 1
#5 3 c 1
#6 3 c 1
#7 4 d 1
#8 4 d 1
#9 1 z 2
答案 3 :(得分:0)
x %>%
arrange(x) %>%
mutate(xid = lag(x$x, default = 1),
yid = lag(as.character(x$y), default = "a")) %>%
group_by(x) %>%
mutate(id = cumsum(x != xid | as.character(y) != yid) + 1) %>%
mutate(xid = NULL, yid = NULL)
这会产生:
x y id
<dbl> <fctr> <dbl>
1 1 a 1
2 1 a 1
3 1 z 2
4 2 b 1
5 2 b 1
6 3 c 1
7 3 c 1
8 4 d 1
9 4 d 1