我在R中有一个包含多列的数据集。一列room_type
包含字符串值Entire home/apt
或Shared room
或Private room
或为空。我想创建一个新列room_type_new
,其中字符串值基于列room_type
。注意:数据集中有超过10万行。
参见下文:
room_type room_type_new
Entire home/apt Entire home
Private room Shared home
Shared room Shared home
NA NULL
我尝试了此代码,该代码可打印正确的输出,但不会将新值传递给room_type_new
:
data1$room_type <- as.character(data1$room_type)
data1$room_type_new <- NA
data1$room_type_new <- as.character(data1$room_type_new)
data1%>%
mutate(room_type_new = case_when(.$room_type %in% c("Entire home/apt") ~ "Entire home",.$room_type %in% c("Private room", "Shared room") ~ "Shared home")
答案 0 :(得分:1)
这里是case_when
library(dplyr)
library(stringr)
df1 %>%
mutate(room_type_new = case_when(str_detect(room_type, "Entire") ~
'Entire home', is.na(room_type) ~ NA_character_, TRUE ~ "Shared home"))
答案 1 :(得分:0)
构建df:
df <- data.frame(room_type = as.character(c("Entire home/apt", "Private room", "Shared room")), stringsAsFactors = F)
假设一个df $ room_type_new仅两个唯一值,以R为基数的底线:
df$room_type_new <- ifelse(grepl("Entire home/apt", df$room_type), "Entire home", "Shared home")
如果df $ room_type_new的> 2个唯一值,则以R为底线,嵌套ifelse:
df$room_type_new <- ifelse(grepl("Entire home/apt", df$room_type), "Entire home", ifelse(grepl("Private room|Shared room", df$room_type), "Shared home", "")
输出之所以传递给任何东西的原因是因为您没有使用代码将结果分配给对象:
data1 <-
data1%>%
mutate(room_type_new = case_when(.$room_type %in% c("Entire home/apt") ~ "Entire home",.$room_type %in% c("Private room", "Shared room") ~ "Shared home")