说我有以下数据框(实际的是10个labelx列):
id <- c(1,2,3,4,5,6,7,8)
label1 <- c("apple","shoe","banana","hat","dog","radio","tree","pie")
label2 <- c("apple","sneaker","fruit","beanie","pet","ipod","doug fir","pie")
df <- data.frame(id,label1,label2)
我想用对它进行分类的单词替换标签列中的所有项目。
food <- c("apple","banana","pie","fruit")
clothing <- c("shoe","hat","beanie")
entertainment <- c("radio","ipod","mp3 player","phone")
forest <- c("tree","doug fir","redwood","forest")
我尝试了以下操作:
column_list <- c("label1","label2")
new_df <- df
for(i in 1:2) {
new_df <- new_df %>%
mutate(parse(text=column_list[i-1]) = replace(parse(text=column_list[i-1]),
(parse(text=column_list[i-1]) %in% food),
"food"))
}
我不必这样做,更简单也可以。 Tidyverse优先。如何在R数据框中的多个列之间替换多个值?
答案 0 :(得分:2)
一种可能是使用mutate_at()
,然后使用嵌套的ifelse()
:
df %>%
mutate_at(vars(contains("label")),
funs(ifelse(. %in% food, "food",
ifelse(. %in% clothing, "clothing",
ifelse(. %in% entertainment, "entertainment",
ifelse(. %in% forest, "forest", NA_character_))))))
id label1 label2
1 1 food food
2 2 clothing <NA>
3 3 food food
4 4 clothing clothing
5 5 <NA> <NA>
6 6 entertainment entertainment
7 7 forest forest
8 8 food food
使用mutate_at()
,它选择名称中带有“标签”的变量,然后根据条件简单地应用嵌套的ifelse()
。
答案 1 :(得分:2)
这是一种使用基数R的方法。该想法是创建一个命名矢量,其中名称是单独的事物(apple
,shoe
等),值是类别({{1 }},food
等)。然后就可以直接使用名称提取类别了。
clothing
答案 2 :(得分:0)
如果所有标签都在一个列表中,则可以使用dplyr::recode
。
library("dplyr")
labels <- list()
for (x in food) { labels[x] = "food" }
for (x in clothing) { labels[x] = "clothing" }
for (x in entertainment) { labels[x] = "entertainment" }
for (x in forest) { labels[x] = "forest" }
df %>%
mutate_at(vars(label1, label2), recode, !!!labels)