我有这个data.frame:
df = data.frame(a = c(1,1,2,2,3,3), b = c(1:6), c = c(1,2,3,5,7,8))
a b c
-----
1 1 1
1 2 2
2 3 3
2 4 5
3 5 7
3 6 8
我想要变量a中的每个值,只保留 一个新变量d,即变量b和c的唯一联合:
a d
---
1 1
1 2
2 3
2 4
2 5
3 5
3 6
3 7
3 8
这样的事情当然会返回错误:
library(dplyr)
df %>%
group_by(a) %>%
mutate(d = union(b, c))
有没有人有一个优雅的解决方案?谢谢!
答案 0 :(得分:3)
我建议" data.table"为此:
library(data.table)
unique(as.data.table(df)[, list(d = unlist(.SD)), by = a])
# a d
# 1: 1 1
# 2: 1 2
# 3: 2 3
# 4: 2 4
# 5: 2 5
# 6: 3 5
# 7: 3 6
# 8: 3 7
# 9: 3 8
我想在" dplyr"中采用了类似的方法。也会使用" tidyr",像这样:
library(dplyr)
library(tidyr)
df %>%
gather(var, d, b:c) %>%
select(-var) %>%
unique
# a d
# 1 1 1
# 2 1 2
# 3 2 3
# 4 2 4
# 5 3 5
# 6 3 6
# 10 2 5
# 11 3 7
# 12 3 8