如何在R数据框中的多个列之间替换多个值?

时间:2019-02-26 22:42:53

标签: r dataframe

说我有以下数据框(实际的是10个labelx列):

id <- c(1,2,3,4,5,6,7,8)
label1 <- c("apple","shoe","banana","hat","dog","radio","tree","pie")
label2 <- c("apple","sneaker","fruit","beanie","pet","ipod","doug fir","pie")
df <- data.frame(id,label1,label2)

我想用对它进行分类的单词替换标签列中的所有项目。

food <- c("apple","banana","pie","fruit")
clothing <- c("shoe","hat","beanie")
entertainment <- c("radio","ipod","mp3 player","phone")
forest <- c("tree","doug fir","redwood","forest")

我尝试了以下操作:

column_list <- c("label1","label2")
new_df <- df

for(i in 1:2) {
  new_df <- new_df %>%
  mutate(parse(text=column_list[i-1]) = replace(parse(text=column_list[i-1]),
                      (parse(text=column_list[i-1]) %in% food),
                      "food"))
}

我不必这样做,更简单也可以。 Tidyverse优先。如何在R数据框中的多个列之间替换多个值?

3 个答案:

答案 0 :(得分:2)

一种可能是使用mutate_at(),然后使用嵌套的ifelse()

df %>%
 mutate_at(vars(contains("label")), 
           funs(ifelse(. %in% food, "food", 
                       ifelse(. %in% clothing, "clothing",
                              ifelse(. %in% entertainment, "entertainment",
                                     ifelse(. %in% forest, "forest", NA_character_))))))


  id        label1        label2
1  1          food          food
2  2      clothing          <NA>
3  3          food          food
4  4      clothing      clothing
5  5          <NA>          <NA>
6  6 entertainment entertainment
7  7        forest        forest
8  8          food          food

使用mutate_at(),它选择名称中带有“标签”的变量,然后根据条件简单地应用嵌套的ifelse()

答案 1 :(得分:2)

这是一种使用基数R的方法。该想法是创建一个命名矢量,其中名称是单独的事物(appleshoe等),值是类别({{1 }},food等)。然后就可以直接使用名称提取类别了。

clothing

答案 2 :(得分:0)

如果所有标签都在一个列表中,则可以使用dplyr::recode

library("dplyr")

labels <- list()

for (x in food)          { labels[x] = "food" }
for (x in clothing)      { labels[x] = "clothing" }
for (x in entertainment) { labels[x] = "entertainment" }
for (x in forest)        { labels[x] = "forest" }

df %>%
  mutate_at(vars(label1, label2), recode, !!!labels)