这是有关有效编写逻辑条件的问题。
假设我要重新编码变量,如果集合中的任何列等于特定值。
test <- tibble(
CompanyA = rep(c(0:1),5),
CompanyB = rep(c(0),10),
CompanyC = c(1,1,1,1,0,0,1,1,1,1)
)
test
一种基本方法是:
test$newvar <- ifelse(test$CompanyA==1 | test$CompanyB == 1 | test$CompanyC == 1,-99,0)
table(test$newvar)
但是,如果我有几十个列怎么办?我不想写出CompanyA
,CompanyB
等。是否有一种实质上使用%in
类型语句的方法?这是一个明显错误的方法:
condition <- columns %in% c("CompanyA", "CompanyB", "CompanyC") . # obviously doesn't work
test$newvar[condition] <- 1
或者这是一种更简单的方式-例如if CompanyA:CompanyC == 1, then do...
?
答案 0 :(得分:1)
通过reshaping test
从长到宽,我能够创建一列来测试CompanyX
列中的任何值是否包含1。
# load necessary packages ----
library(tidyverse)
# load necessary data ----
test <-
tibble(CompanyA = rep(c(0:1),5),
CompanyB = rep(c(0),10),
CompanyC = c(1,1,1,1,0,0,1,1,1,1)) %>%
# create an 'id' column
mutate(id = 1:n())
# calculations -----
new.var <-
test %>%
# transfrom data from long to wide
gather(key = "company", value = "value", -id) %>%
# for each 'id' value
# test if any 'value' is equal to 1
# if so, return -99; else return 0
group_by(id) %>%
summarize(new_var = if_else(any(value == 1), -99, 0))
# left join new.var onto test ---
test <-
test %>%
left_join(new.var, by = "id")
# view results ---
test
# A tibble: 10 x 5
# CompanyA CompanyB CompanyC id new_var
# <int> <dbl> <dbl> <int> <dbl>
# 1 0 0 1 1 -99
# 2 1 0 1 2 -99
# 3 0 0 1 3 -99
# 4 1 0 1 4 -99
# 5 0 0 0 5 0
# 6 1 0 0 6 -99
# 7 0 0 1 7 -99
# 8 1 0 1 8 -99
# 9 0 0 1 9 -99
# 10 1 0 1 10 -99
# end of script #