Question

我一直在使用基数R，但是我想使用dplyr。这就是我一直在做的：

data$newvariable <- 0
data$newvariable[data$oldvariable=="happy"] <- "good"
data$newvariable[data$oldvariable=="unhappy"] <- "bad"
data$newvariable[data$oldvariable=="depressed"] <- "super_bad"

Answer 1

如果oldvariable是一个因素，并且您不介意newvariable是其中一个：

library(dplyr)

set.seed(111)
data = data.frame(
oldvariable=sample(c("happy","unhappy","depressed"),10,replace=TRUE))

data %>% mutate(newvariable=recode_factor(oldvariable,
"happy"="good","unhappy"="bad","depressed"="super_bad"))


   oldvariable newvariable
1      unhappy         bad
2    depressed   super_bad
3    depressed   super_bad
4    depressed   super_bad
5        happy        good
6    depressed   super_bad
7        happy        good
8    depressed   super_bad
9      unhappy         bad
10       happy        good

Answer 2

在dplyr中，我们可以使用case_when根据newvariable向oldvariable分配新值。

library(dplyr)

data = data.frame(
  oldvariable = c("happy", "unhappy", "depressed")
)

data %>%
  mutate(newvariable = case_when(
    oldvariable == "happy" ~ "good",
    oldvariable == "unhappy" ~ "bad",
    oldvariable == "depressed" ~ "super_bad"
  ))
#>   oldvariable newvariable
#> 1       happy        good
#> 2     unhappy         bad
#> 3   depressed   super_bad

有没有一种方法可以在R中使用dplyr根据另一列的值创建新列？

2 个答案: