如果因数的频率低于某个数字,则数据框将更改因数的名称

时间:2018-12-07 18:52:44

标签: r

我有一个数据框,其中有一列因子和一列数字,如下所示。

x <- data.frame(c("Cat", "Dog", "Cat",
                  "Elephant", "Cat", "Zebra",
                  "Cow", "Cow", "Sheep"),
                 c(12, 5, 19, 6, 1, 20, 3, 11, 4))
colnames(x) <- c("animals", "number")

我想更改此数据框,以便如果我的“动物”列中动物的频率小于2,则动物名称将更改为“其他”,因此如下所示:

data.frame(c("Cat", "Other", "Cat",
            "Other", "Cat", "Other",
              "Cow", "Cow", "Other"),
                c(12, 5, 19, 6, 1, 20, 3, 11, 4))

我已经确定了如何使用下面的代码确定哪些列的频率低于2,但无法确定如何更改与这些数字关联的变量的名称。任何评论将不胜感激!

x.count <- count(x, "animals")
which(x.count$freq < 2)

1 个答案:

答案 0 :(得分:0)

我们可以使用ifelse

library(tidyverse)
n <- 2
x %>% 
  group_by(animals) %>% 
  mutate(animals1 = as.character(animals), animals1 = ifelse(n() < n,
          "Other", animals1)) %>%
  ungroup %>%
  select(animals = animals1, number)
# A tibble: 9 x 2
#  animals number
#  <chr>    <dbl>
#1 Cat         12
#2 Other        5
#3 Cat         19
#4 Other        6
#5 Cat          1
#6 Other       20
#7 Cow          3
#8 Cow         11
#9 Other        4

或与base R

i1 <- with(x, ave(seq_along(animals), animals, FUN = length) < n)
levels(x$animals) <- c(levels(x$animals), "Other")
x$animals[i1] <- "Other"