我有这样的数据框:
library(dplyr)
set.seed(5)
# generate sample data
df <- data.frame(value = 1:10,
type = sample(LETTERS, 10))
value type
1 1 B
2 2 K
3 3 O
4 4 Y
5 5 I
6 6 U
7 7 G
8 8 S
9 9 C
10 10 F
我想根据列表中定义的类别对“类型”列进行分组:
groups <- list(LETTERS[1:7],
LETTERS[8:15],
LETTERS[16:20],
"other")
print(groups)
# [[1]]
# [1] "A" "B" "C" "D" "E" "F" "G"
#
# [[2]]
# [1] "H" "I" "J" "K" "L" "M" "N" "O"
#
# [[3]]
# [1] "P" "Q" "R" "S" "T"
#
# [[4]]
# [1] "other"
输出应为:
value type group
1 1 B 1
2 2 K 2
3 3 O 2
4 4 Y other
5 5 I 2
6 6 U other
7 7 G 1
8 8 S 3
9 9 C 1
10 10 F 1
我的方法如下:
# group data
df_grouped <- df %>%
mutate(group = ifelse(type %in% groups[[1]], 1,
ifelse(type %in% groups[[2]], 2,
ifelse(type %in% groups[[3]], 3, "other"))))
由于我有更多的组,因此我不喜欢代码中的ifelse
循环。维护代码并不容易。有没有更有效的方法来实现这一目标?
答案 0 :(得分:3)
一种简单的方法是使用groups
将reshape2::melt
转换为数据帧并执行left_join
:
library(dplyr)
library(tidyr)
library(reshape2)
left_join(df, melt(groups), by = c(type = "value")) %>%
replace_na(list(L1 = "other")) %>%
rename(group = L1)
#> value type group
#> 1 1 B 1
#> 2 2 K 2
#> 3 3 O 2
#> 4 4 Y other
#> 5 5 I 2
#> 6 6 U other
#> 7 7 G 1
#> 8 8 S 3
#> 9 9 C 1
#> 10 10 F 1
给出相同结果的基本R方法将是
df$group <- sapply(type, function(s) {
i <- which(sapply(groups, function(g) s %in% g))
if(length(i) < 1) "other" else i
}))
答案 1 :(得分:2)
我们可以将age
与加入一起使用
enframe
答案 2 :(得分:2)
这是使用stack
+ merge
out <- type.convert(merge(df,stack(setNames(groups,seq_along(groups))),by.x = "type",by.y = "values",all.x = TRUE))
replace(out,is.na(out),"other")[match(df$value,out$value),]
给出
type value ind
1 B 1 1
6 K 2 2
7 O 3 2
10 Y 4 other
5 I 5 2
9 U 6 other
4 G 7 1
8 S 8 3
2 C 9 1
3 F 10 1
答案 3 :(得分:1)
将列表转换为命名向量并使用标准查找:
df$group = replace(v <- setNames(rep(seq_along(groups), lengths(groups)),
unlist(groups))[df$type], is.na(v), "other")
另一种base
替代方案:使用命名列表重命名因子的级别:
df$group = factor(df$type)
levels(df$group) = setNames(groups, seq_along(groups))
现在,“其他”组由NA
表示。如果您想更改它:
df$group = as.character(df$group)
df$group[is.na(df$group)] = "other"