Question

我有一个 tibble 像这样：

dat  = tibble(a1 = c(23, NA, 3, 0, NA),
                 a2 = c(NA, 6, 0, 9, NA),
                a3 = c(NA, NA, "censored", "censored", NA),
                a4 = c(NA, "censored", NA, NA, NA))

我想创建满足以下条件的名为“class”的新变量：

如果 a1 或 a2 的数字不等于 0，则 class = "yes",
如果所有以字母“a”开头的变量等于NA，则class =“no”，
其他，class = "censored"（这些列中只有一列有“censored”，然后 class = "censored"）

Answer 1

尝试仅使用基础 R 创建示例。不确定我是否正确理解了所有条件。

我相信使用 dplyr 或 data.table 可能有更好的解决方案，但我不知道您的偏好。

library(tibble)

# create data
dat  = tibble(
  a1 = c(23, NA, 3, 0, NA),
  a2 = c(NA, 6, 0, 9, NA),
  a3 = c(NA, NA, "censored", "censored", NA),
  a4 = c(NA, "censored", NA, NA, NA)
)

# 1. if either a1 or a2 has the number not equal to 0, then class = "yes" ####

dat$class <- ifelse(dat$a1 != 0 | dat$a2 != 0, 'yes', NA)

# 2. if all variables that start with letter "a" equal to NA, then class = "no" ####

# identify names starting with "a" and create a pattern for grepl
names <- names(dat)[grep("^a.*", names(dat))]
pattern <- paste(names, collapse = '|')

# check if all pattern cols are NA and apply "no" to dat$class
# achieved by comparing row sum of NA cols with ncol()
dat$class <-
  ifelse(rowSums(is.na(dat[, grepl(pattern, colnames(dat))])) == ncol(dat[, grepl(pattern, colnames(dat))]), 'no', dat$class)


# 3. other else, class = "censored" (only one of these columns has "censored", then class = "censored") ####

# check if pattern cols contain "censored" and apply "censored" to dat$class
# achieved by checking for row sum > 0 matching the condition of == "censored"

dat$class <-
  ifelse(rowSums(dat[, grepl(pattern, colnames(dat))] == "censored", na.rm = TRUE) > 0,
         "censored",
         dat$class)

在本例中可以通过索引 dat[,1:4] 访问以“a”开头的列，但您的实际数据可能看起来不同。

更新

基于@NarimeneL 之前给出的解决方案的示例。请注意，case_when 语句的顺序在这里很重要！

library(tibble)
library(dplyr)
library(magrittr)
library(tidyselect)


# create data
dat  = tibble(
  a1 = c(23, NA, 3, 0, NA),
  a2 = c(NA, 6, 0, 9, NA),
  a3 = c(NA, NA, "censored", "censored", NA),
  a4 = c(NA, "censored", NA, NA, NA)
)


dat2 <- dat %>% select(starts_with("a")) %>%
  mutate(class = case_when(
    rowSums(. == "censored", na.rm = TRUE) > 0 ~ "censored" ,
    a1 != 0  ~ "Yes ",
    a2 != 0 ~ "Yes",
    rowSums(is.na(.)) == ncol(.) ~ 'no'
  ))

Answer 2

我对示例数据有些困惑。如果我正确理解规则，那么示例中的任何行都不会被审查，因为 a1 或 a2 总是非零，除了最后一行都是 NA。

mutate(dat, class = case_when(
  a1 != 0 | a2 != 0 ~ "yes",
  if_all(starts_with("a"), is.na) ~ "no",
  TRUE ~ "censored"
))

# A tibble: 5 x 5
     a1    a2 a3       a4       class
  <dbl> <dbl> <chr>    <chr>    <chr>
1    23    NA NA       NA       yes  
2    NA     6 NA       censored yes  
3     3     0 censored NA       yes  
4     0     9 censored NA       yes  
5    NA    NA NA       NA       no

Answer 3

您可以像这样在数据框上转换表格：

dat = as.data.frame(dat)

然后您可以创建带有条件的新变量：

library(dplyr)
library(magrittr)
library(tidyselect)


dat2 = dat %>% select(starts_with("a")) %>%  mutate(
  class = case_when(
    a1 != 0  ~ "Yes ",
    a2 != 0 ~"Yes"    ))

在所有列中创建一个带有 NA 的新变量

3 个答案:

更新