对于数据集的每一行,我都有多个字符串变量。
例如
var1 var1 var3 var4
1 mother daughter house tea
2 mother father daughter NA
3 house tea pencil paper
如果该行中存在以下单词之一,我想创建一个新变量(新):
母亲,父亲,女儿是这样
var1 var1 var3 var4 new
1 mother daughter house tea 1
2 mother father daughter NA 1
3 house tea pencil paper 0
有办法吗?不幸的是,我什至不知道从哪里开始。
答案 0 :(得分:0)
是的,有办法。这是一个
our_strings <- c("mother", "daughter", "father")
df$new <- as.integer(apply(df, 1, function(x) any(x %in% our_strings)))
df
# V1 var1 var2 var3 var4 new
# 1 1 mother daughter house tea 1
# 2 2 mother father daughter <NA> 1
# 3 3 house tea pencil paper 0
可复制的数据:
df <- data.frame(
V1 = 1:3,
var1 = c("mother", "mother", "house"),
var2 = c("daughter", "father", "tea"),
var3 = c("house", "daughter","pencil"),
var4 = c("tea", NA, "paper")
)
答案 1 :(得分:0)
我们可以使用rowSums
df$new <- +(rowSums(df == "mother" | df == "daughter" | df == "father", na.rm = TRUE) > 0)
df
# var1 var2 var3 var4 new
#1 mother daughter house tea 1
#2 mother father daughter <NA> 1
#3 house tea pencil paper 0
或lapply
df$new <- +(Reduce(`|`, lapply(df, `%in%`, c("mother", "daughter", "father"))))
我们还可以使用pmap_int
中的purrr
library(dplyr)
library(purrr)
df %>%
mutate(new = pmap_int(., ~+(any(c(...) %in% c("mother", "daughter", "father")))))
数据
df <- structure(list(var1 = c("mother", "mother", "house"), var2 =
c("daughter", "father", "tea"), var3 = c("house", "daughter", "pencil"),
var4 = c("tea", NA, "paper")), row.names = c("1", "2", "3"), class = "data.frame")