Question

考虑以下数据集：

  patientID age     age2      age3 equal
1         1  25     25        25   TRUE
2         2  34     34        32   FALSE
3         3  28     28        20   FALSE
4         4  52     18        19   FALSE

如果equal，age和age2彼此相等，我想将age3列更改为TRUE。我认为这很简单：

data %>% 
  mutate(equal = ifelse(age == age_2 == age_3, 1, 0))

但是我想R无法背对背解释三个==符号，因为它给出了“意外符号==错误”。我这样纠正了这个问题：

data %>% 
  mutate(equal = ifelse(isTRUE(all.equal(age, age_2, age_3)), 1, 0))

，它为每一列返回"FALSE"的值。避免使用成对的if语句时，正确的方法是什么？（例如(age == age_2) & (age_2 == age_3)）？

Answer 1

一个更简单的选择是将“年龄”列中的第一列与其余列进行比较，并使用rowSums创建逻辑条件。

nm1 <- grep("age", names(data))
data$equal <- !rowSums(data[nm1][,1] != data[nm1]) 
data$equal
#[1]  TRUE FALSE FALSE FALSE

我们也可以使用tidyverse

library(tidyverse)
data %>% 
   mutate(equal = pmap(select(., starts_with('age')),
          ~ n_distinct(c(...)) == 1))
#  patientID age age2 age3 equal
#1         1  25   25   25  TRUE
#2         2  34   34   32 FALSE
#3         3  28   28   20 FALSE
#4         4  52   18   19 FALSE

数据

data <- structure(list(patientID = 1:4, age = c(25L, 34L, 28L, 52L), 
    age2 = c(25L, 34L, 28L, 18L), age3 = c(25L, 32L, 20L, 19L
    )), row.names = c("1", "2", "3", "4"), class = "data.frame")

Answer 2

在dplyr中执行此操作的另一种方法是（使用akrun's data）：

library(dplyr)

data %>%
  rowwise() %>% 
  mutate(equal = +(n_distinct(c(age,age2,age3))==1))

#   patientID age age2 age3 equal
# 1         1  25   25   25     1
# 2         2  34   34   32     0
# 3         3  28   28   20     0
# 4         4  52   18   19     0

Answer 3

这是使用tidyverse的更通用的pmap解决方案。我们可以调整select函数以获取所需的任何列。 pmap从选定的列中获取每一行，并检查每一行的所有元素是否等于该行的第一个元素：

library(tidyverse)

data %>%
  mutate(equal = select(., matches("age")) %>% pmap(~ all(c(...) == ..1)))

在基本R中也有apply：

data$equal <- apply(data[grep("age", names(data))], 1, function(x) all(x == x[1]))

输出：

  patientID age age2 age3 equal
1         1  25   25   25  TRUE
2         2  34   34   32 FALSE
3         3  28   28   20 FALSE
4         4  52   18   19 FALSE

基于dplyr链中多个列的条件评估的变异变量

3 个答案:

数据