根据Rstudio中现有列的字符串值添加新列

时间:2019-10-09 20:03:55

标签: r string dplyr multiple-columns

我在R中有一个包含多列的数据集。一列room_type包含字符串值Entire home/aptShared roomPrivate room或为空。我想创建一个新列room_type_new,其中字符串值基于列room_type。注意:数据集中有超过10万行。

参见下文:

room_type          room_type_new
Entire home/apt    Entire home
Private room       Shared home
Shared room        Shared home
NA                 NULL

我尝试了此代码,该代码可打印正确的输出,但不会将新值传递给room_type_new

data1$room_type <- as.character(data1$room_type)
data1$room_type_new <- NA
data1$room_type_new <- as.character(data1$room_type_new)

    data1%>%
      mutate(room_type_new = case_when(.$room_type %in% c("Entire home/apt") ~ "Entire home",.$room_type %in% c("Private room", "Shared room") ~ "Shared home")

2 个答案:

答案 0 :(得分:1)

这里是case_when

的一个选项
library(dplyr)
library(stringr)
df1 %>%
   mutate(room_type_new = case_when(str_detect(room_type, "Entire") ~ 
     'Entire home', is.na(room_type) ~ NA_character_, TRUE ~ "Shared home"))

答案 1 :(得分:0)

构建df:

df <- data.frame(room_type = as.character(c("Entire home/apt", "Private room", "Shared room")), stringsAsFactors = F)

假设一个df $ room_type_new仅两个唯一值,以R为基数的底线:

df$room_type_new <- ifelse(grepl("Entire home/apt", df$room_type), "Entire home", "Shared home")

如果df $ room_type_new的> 2个唯一值,则以R为底线,嵌套ifelse:

df$room_type_new <- ifelse(grepl("Entire home/apt", df$room_type), "Entire home", ifelse(grepl("Private room|Shared room", df$room_type), "Shared home", "")

输出之所以传递给任何东西的原因是因为您没有使用代码将结果分配给对象:

data1 <- 
 data1%>%
  mutate(room_type_new = case_when(.$room_type %in% c("Entire home/apt") ~ "Entire home",.$room_type %in% c("Private room", "Shared room") ~ "Shared home")