我有一个数据框,其中有一列因子和一列数字,如下所示。
x <- data.frame(c("Cat", "Dog", "Cat",
"Elephant", "Cat", "Zebra",
"Cow", "Cow", "Sheep"),
c(12, 5, 19, 6, 1, 20, 3, 11, 4))
colnames(x) <- c("animals", "number")
我想更改此数据框,以便如果我的“动物”列中动物的频率小于2,则动物名称将更改为“其他”,因此如下所示:
data.frame(c("Cat", "Other", "Cat",
"Other", "Cat", "Other",
"Cow", "Cow", "Other"),
c(12, 5, 19, 6, 1, 20, 3, 11, 4))
我已经确定了如何使用下面的代码确定哪些列的频率低于2,但无法确定如何更改与这些数字关联的变量的名称。任何评论将不胜感激!
x.count <- count(x, "animals")
which(x.count$freq < 2)
答案 0 :(得分:0)
我们可以使用ifelse
library(tidyverse)
n <- 2
x %>%
group_by(animals) %>%
mutate(animals1 = as.character(animals), animals1 = ifelse(n() < n,
"Other", animals1)) %>%
ungroup %>%
select(animals = animals1, number)
# A tibble: 9 x 2
# animals number
# <chr> <dbl>
#1 Cat 12
#2 Other 5
#3 Cat 19
#4 Other 6
#5 Cat 1
#6 Other 20
#7 Cow 3
#8 Cow 11
#9 Other 4
或与base R
i1 <- with(x, ave(seq_along(animals), animals, FUN = length) < n)
levels(x$animals) <- c(levels(x$animals), "Other")
x$animals[i1] <- "Other"