我有这个数据框:
> set.seed(100)
> df <- data.frame(X1 = sample(c(1:7, NA), 10, replace=TRUE),
X2 = sample(c(1:7, NA), 10, replace=TRUE),
X3 = sample(c(1:7, NA), 10, replace=TRUE),
YY = sample(c("a","b"), 10, replace=TRUE),
stringsAsFactors = FALSE)
> df
X1 X2 X3 YY
1 3 5 5 a
2 3 NA 6 b
3 5 3 5 a
4 1 4 6 b
5 4 7 4 b
6 4 6 2 b
7 7 2 7 a
8 3 3 NA b
9 5 3 5 b
10 2 6 3 a
最终结果是这样的:
YY XX
a -0.17
b -0.38
每个百分比的公式是:
({counts of c(6,7)
-counts of c(1,2,3,4)
)/ counts of c(1,2,3,4,5,6,7)
。例如,为-0.17
获取a
:
Where the columns are all (`X1, X2, X3`) and `YY = a`, then:
prom = counts of c(6,7) = 3
detr = counts of c(1,2,3,4) = 5
total = counts of c(1,2,3,4,5,6,7) = 12
The percentage is (prom - detr) / total = (2-3)/ 9 = -0.17
但是,我只能在使用summarize_all()
时按列进行计算:
df %>%
group_by(YY) %>%
summarize_all(~ (sum(.x %in% 6:7) - sum(.x %in% 1:4)) / sum(.x %in% 1:7))
YY X1 X2 X3
<chr> <dbl> <dbl> <dbl>
1 a -0.333 -1 0.333
2 b 0.167 -0.714 -0.667
当我要计算YY
中给定类别的所有列时,而不是按列计算(如以上所需的输出所示)。
答案 0 :(得分:3)
可以尝试:
library(tidyverse)
df %>%
gather(key, val, -YY) %>%
group_by(YY) %>%
summarise(
XX = ( sum(val %in% 6:7) - sum(val %in% 1:4) ) / sum(val %in% 1:7)
)
输出:
# A tibble: 2 x 2
YY XX
<chr> <dbl>
1 a -0.167
2 b -0.375
答案 1 :(得分:3)
尝试melt
library(reshape2)
library(dplyr)
melt(df,'YY')%>%
group_by(YY)%>%
summarise(XX=(sum(value %in% 6:7) - sum(value %in% 1:4)) / sum(value%in% 1:7))
# A tibble: 2 x 2
YY XX
<chr> <dbl>
1 a -0.714285714285714
2 b 0.105263157894737