在条件变异中使用子查询的结果

时间:2018-01-22 13:14:53

标签: r filter dplyr subquery mutate

我想根据同一数据帧中子查询的结果计算一个新行。最小(非)工作示例:

library(plyr)
library(dplyr)

df <- data.frame(
  VAR1 = c("A", "A", "B", "C"),
  VAR2 = c("F", "G", "E", "D"),
  VAR3 = c("G", "F", "X", "D")
) %>% as_tibble

subquery <- function(v1, v2) {
  dplyr::filter(df, as.character(v1) == VAR1, as.character(v2) == VAR2)
}

TEST <-
  df %>%
  mutate(X = case_when(
    plyr::empty(subquery(VAR1, VAR3)) ~ "EMPTY",
    TRUE ~ "NON EMPTY"
  ))

结果数据框TEST应为

VAR1   VAR2   VAR3   X        
<fctr> <fctr> <fctr> <chr>    
A      F      G      NON EMPTY
A      G      F      NON EMPTY
B      E      X      EMPTY
C      D      D      NON EMPTY

但是

VAR1   VAR2   VAR3   X        
<fctr> <fctr> <fctr> <chr>    
A      F      G      NON EMPTY
A      G      F      NON EMPTY
B      E      X      NON EMPTY
C      D      D      NON EMPTY

非常感谢提前!

备注:如果我没有强制v1v2强迫character,我会收到以下错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: Evaluation error: level sets of factors are different..

1 个答案:

答案 0 :(得分:2)

我会将empty函数放在subquery函数中,以便返回TRUE或FALSE值。然后可以对其进行矢量化以将其应用于数据帧的每一行:

library(plyr)
library(dplyr)

df <- data.frame(
  VAR1 = c("A", "A", "B", "C"),
  VAR2 = c("F", "G", "E", "D"),
  VAR3 = c("G", "F", "X", "D")
) %>% as_tibble

subquery <- function(v1, v2) {
  empty(filter(df, as.character(v1) == VAR1, as.character(v2) == VAR2))
}

subquery = Vectorize(subquery)

  df %>%
  mutate(X = case_when(
    subquery(VAR1, VAR3) == FALSE ~ "NON EMPTY",
    TRUE ~ "EMPTY"
  ))

# # A tibble: 4 x 4
#   VAR1  VAR2  VAR3  X        
#   <fct> <fct> <fct> <chr>    
# 1 A     F     G     NON EMPTY
# 2 A     G     F     NON EMPTY
# 3 B     E     X     EMPTY    
# 4 C     D     D     NON EMPTY

或者您可以将emptycase_when放在subquery函数中,如下所示:

subquery <- function(v1, v2) {
  res = empty(filter(df, as.character(v1) == VAR1, as.character(v2) == VAR2))
  case_when(res == FALSE ~ "NON EMPTY",
            TRUE ~ "EMPTY")
}

subquery = Vectorize(subquery)

df %>% mutate(X = subquery(VAR1, VAR3))