如何提取多行匹配?

时间:2019-02-22 21:22:01

标签: r dataframe

我仍在学习R,但有一个基本问题。我有一个数据帧(df),看起来像这样:

    Factor   P1    P2    P3    P4    P5
 1      A  TRUE FALSE  TRUE FALSE  TRUE
 2      A FALSE  TRUE FALSE FALSE  TRUE
 3      B  TRUE  TRUE  TRUE FALSE FALSE
 4      B  TRUE FALSE FALSE  TRUE FALSE
 5      C FALSE FALSE FALSE  TRUE FALSE
 6      C  TRUE  TRUE FALSE FALSE FALSE

df = data.frame("Factor" = c("A","A","B","B","C","C"),
            "P1" = c("TRUE","FALSE","TRUE","TRUE","FALSE","TRUE"),
            "P2" =c("FALSE","TRUE","TRUE","FALSE","FALSE","TRUE"),
            "P3" = c("TRUE","FALSE","TRUE","FALSE","FALSE","FALSE"), 
            "P4" = c("FALSE","FALSE","FALSE","TRUE","TRUE","FALSE"), 
            "P5" = c("TRUE","TRUE","FALSE","FALSE","FALSE","FALSE"))

我想在Factor上折叠公用df,以便每当TRUE出现在同一Factor的两行中时,它将被报告为{ {1}}。像这样:

TRUE

有人可以帮忙吗?谢谢!

2 个答案:

答案 0 :(得分:2)

这里有两个步骤:

  1. 使用P将所有as.logical列转换为逻辑向量
  2. Factor上分组,然后使用any查看每个P的每个Factor的值是否为TRUE
library(tidyverse)
df <- data.frame("Factor" = c("A", "A", "B", "B", "C", "C"), "P1" = c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE", "TRUE"), "P2" = c("FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "TRUE"), "P3" = c("TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE"), "P4" = c("FALSE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE"), "P5" = c("TRUE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE"))
df %>%
  mutate_at(vars(-Factor), as.logical) %>%
  group_by(Factor) %>%
  summarise_all(any)
#> # A tibble: 3 x 6
#>   Factor P1    P2    P3    P4    P5   
#>   <fct>  <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 A      TRUE  TRUE  TRUE  FALSE TRUE 
#> 2 B      TRUE  TRUE  TRUE  TRUE  FALSE
#> 3 C      TRUE  TRUE  FALSE TRUE  FALSE

reprex package(v0.2.1)于2019-02-22创建

答案 1 :(得分:1)

另一个tidyverse选项可能是:

df %>%
 gather(var, val, -Factor) %>%
 group_by(Factor, var) %>%
 mutate(val = ifelse(any(val), TRUE, FALSE)) %>%
 distinct() %>%
 spread(var, val)

  Factor P1    P2    P3    P4    P5   
  <fct>  <lgl> <lgl> <lgl> <lgl> <lgl>
1 A      TRUE  TRUE  TRUE  FALSE TRUE 
2 B      TRUE  TRUE  TRUE  TRUE  FALSE
3 C      TRUE  TRUE  FALSE TRUE  FALSE

首先,它将数据从宽转换为长,不包括“ Factor”变量。其次,它按“因子”和其他变量分组。第三,检查条件。最后,它将删除重复的行,并将其返回为宽格式。

或者基于@Calum You的想法:

df %>%
 mutate_at(vars(-Factor), as.logical) %>%
 group_by(Factor) %>%
 summarise_all(funs(sum(.) > 0))

或者:

df %>%
 mutate_at(vars(-Factor), as.logical) %>%
 group_by(Factor) %>%
 summarise_all(funs(!all(!.)))

以R为底

x <- cbind(df[, 1], data.frame(apply(df[, -1], 2, function(x) as.logical(x))))
colnames(x) <- colnames(df)

aggregate(. ~ Factor, x, function(x) any(x))

  Factor   P1   P2    P3    P4    P5
1      A TRUE TRUE  TRUE FALSE  TRUE
2      B TRUE TRUE  TRUE  TRUE FALSE
3      C TRUE TRUE FALSE  TRUE FALSE

或者:

aggregate(. ~ Factor, x, function(x) sum(x) > 0)

或者:

aggregate(. ~ Factor, x, function(x) !all(!x))