我仍在学习R,但有一个基本问题。我有一个数据帧(df
),看起来像这样:
Factor P1 P2 P3 P4 P5
1 A TRUE FALSE TRUE FALSE TRUE
2 A FALSE TRUE FALSE FALSE TRUE
3 B TRUE TRUE TRUE FALSE FALSE
4 B TRUE FALSE FALSE TRUE FALSE
5 C FALSE FALSE FALSE TRUE FALSE
6 C TRUE TRUE FALSE FALSE FALSE
df = data.frame("Factor" = c("A","A","B","B","C","C"),
"P1" = c("TRUE","FALSE","TRUE","TRUE","FALSE","TRUE"),
"P2" =c("FALSE","TRUE","TRUE","FALSE","FALSE","TRUE"),
"P3" = c("TRUE","FALSE","TRUE","FALSE","FALSE","FALSE"),
"P4" = c("FALSE","FALSE","FALSE","TRUE","TRUE","FALSE"),
"P5" = c("TRUE","TRUE","FALSE","FALSE","FALSE","FALSE"))
我想在Factor
上折叠公用df
,以便每当TRUE
出现在同一Factor
的两行中时,它将被报告为{ {1}}。像这样:
TRUE
有人可以帮忙吗?谢谢!
答案 0 :(得分:2)
这里有两个步骤:
P
将所有as.logical
列转换为逻辑向量Factor
上分组,然后使用any
查看每个P
的每个Factor
的值是否为TRUE
library(tidyverse)
df <- data.frame("Factor" = c("A", "A", "B", "B", "C", "C"), "P1" = c("TRUE", "FALSE", "TRUE", "TRUE", "FALSE", "TRUE"), "P2" = c("FALSE", "TRUE", "TRUE", "FALSE", "FALSE", "TRUE"), "P3" = c("TRUE", "FALSE", "TRUE", "FALSE", "FALSE", "FALSE"), "P4" = c("FALSE", "FALSE", "FALSE", "TRUE", "TRUE", "FALSE"), "P5" = c("TRUE", "TRUE", "FALSE", "FALSE", "FALSE", "FALSE"))
df %>%
mutate_at(vars(-Factor), as.logical) %>%
group_by(Factor) %>%
summarise_all(any)
#> # A tibble: 3 x 6
#> Factor P1 P2 P3 P4 P5
#> <fct> <lgl> <lgl> <lgl> <lgl> <lgl>
#> 1 A TRUE TRUE TRUE FALSE TRUE
#> 2 B TRUE TRUE TRUE TRUE FALSE
#> 3 C TRUE TRUE FALSE TRUE FALSE
由reprex package(v0.2.1)于2019-02-22创建
答案 1 :(得分:1)
另一个tidyverse
选项可能是:
df %>%
gather(var, val, -Factor) %>%
group_by(Factor, var) %>%
mutate(val = ifelse(any(val), TRUE, FALSE)) %>%
distinct() %>%
spread(var, val)
Factor P1 P2 P3 P4 P5
<fct> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A TRUE TRUE TRUE FALSE TRUE
2 B TRUE TRUE TRUE TRUE FALSE
3 C TRUE TRUE FALSE TRUE FALSE
首先,它将数据从宽转换为长,不包括“ Factor”变量。其次,它按“因子”和其他变量分组。第三,检查条件。最后,它将删除重复的行,并将其返回为宽格式。
或者基于@Calum You的想法:
df %>%
mutate_at(vars(-Factor), as.logical) %>%
group_by(Factor) %>%
summarise_all(funs(sum(.) > 0))
或者:
df %>%
mutate_at(vars(-Factor), as.logical) %>%
group_by(Factor) %>%
summarise_all(funs(!all(!.)))
以R为底
x <- cbind(df[, 1], data.frame(apply(df[, -1], 2, function(x) as.logical(x))))
colnames(x) <- colnames(df)
aggregate(. ~ Factor, x, function(x) any(x))
Factor P1 P2 P3 P4 P5
1 A TRUE TRUE TRUE FALSE TRUE
2 B TRUE TRUE TRUE TRUE FALSE
3 C TRUE TRUE FALSE TRUE FALSE
或者:
aggregate(. ~ Factor, x, function(x) sum(x) > 0)
或者:
aggregate(. ~ Factor, x, function(x) !all(!x))