在每一行中查找字符串变量并在R中生成新变量

时间:2019-09-25 10:10:03

标签: r

对于数据集的每一行,我都有多个字符串变量。

例如

  var1     var1      var3      var4
1 mother   daughter  house     tea 
2 mother   father    daughter  NA
3 house    tea       pencil    paper

如果该行中存在以下单词之一,我想创建一个新变量(新):

母亲,父亲,女儿是这样

  var1     var1      var3      var4     new
1 mother   daughter  house     tea      1
2 mother   father    daughter  NA       1
3 house    tea       pencil    paper    0

有办法吗?不幸的是,我什至不知道从哪里开始。

2 个答案:

答案 0 :(得分:0)

是的,有办法。这是一个

our_strings <- c("mother", "daughter", "father")
df$new <- as.integer(apply(df, 1, function(x) any(x %in% our_strings)))
df

#   V1   var1     var2     var3  var4 new
# 1  1 mother daughter    house   tea   1
# 2  2 mother   father daughter  <NA>   1
# 3  3  house      tea   pencil paper   0

可复制的数据:

df <- data.frame(
  V1   = 1:3, 
  var1 = c("mother", "mother", "house"), 
  var2 = c("daughter", "father", "tea"), 
  var3 = c("house", "daughter","pencil"), 
  var4 = c("tea", NA, "paper")
)

答案 1 :(得分:0)

我们可以使用rowSums

df$new <- +(rowSums(df == "mother" | df == "daughter" | df == "father", na.rm = TRUE) > 0)

df
#    var1     var2     var3  var4 new
#1 mother daughter    house   tea   1
#2 mother   father daughter  <NA>   1
#3  house      tea   pencil paper   0

lapply

df$new <- +(Reduce(`|`, lapply(df, `%in%`, c("mother", "daughter", "father"))))

我们还可以使用pmap_int中的purrr

library(dplyr)
library(purrr)

df %>%
  mutate(new = pmap_int(., ~+(any(c(...) %in% c("mother", "daughter", "father")))))

数据

df <- structure(list(var1 = c("mother", "mother", "house"), var2 = 
c("daughter", "father", "tea"), var3 = c("house", "daughter", "pencil"), 
var4 = c("tea", NA, "paper")), row.names = c("1", "2", "3"), class = "data.frame")