Question

假设我有以下数据集：

data = tibble::tibble(
  id = c("x", "y", "x"),
  inputA = c(1, NA, NA),
  inputB = c(2, 1, NA),
  inputC = c(3, 2, 3)
)

看起来像这样：

# A tibble: 3 x 4
  id    inputA inputB inputC
  <chr>  <dbl>  <dbl>  <dbl>
1 x          1      2      3
2 y         NA      1      2
3 x         NA     NA      3

我想为每个 id（每个唯一的行）创建一个变量，用于标识 id 的输入。我的意思是，如果输入变量不丢失 (NA)，新变量应该指示 id 的输入。

所需的输出应如下所示：

# A tibble: 3 x 5
  id    inputA inputB inputC inputs              
  <chr>  <dbl>  <dbl>  <dbl> <chr>               
1 x          1      2      3 inputA-inputB-inputC
2 y         NA      1      2 inputB-inputC       
3 x         NA     NA      3 inputC

我要创建的变量是inputs

Answer 1

在 rowwise 中使用 dplyr ：

library(dplyr)

cols <- names(data)[-1]

data %>%
  rowwise() %>%
  mutate(inputs = paste0(cols[!is.na(c_across(all_of(cols)))], collapse = '-'))

#   id    inputA inputB inputC inputs              
#  <chr>  <dbl>  <dbl>  <dbl> <chr>               
#1 x          1      2      3 inputA-inputB-inputC
#2 y         NA      1      2 inputB-inputC       
#3 x         NA     NA      3 inputC

在基础 R 中：

data$inputs <- apply(!is.na(data[cols]), 1, function(x) 
                     paste0(cols[x], collapse = '-'))

创建一个包含列名的新变量，以防该值不是 NA

1 个答案: