Question

我有这个数据框：

> set.seed(100)
> df <- data.frame(X1 = sample(c(1:7, NA), 10, replace=TRUE),
                 X2 = sample(c(1:7, NA), 10, replace=TRUE),
                 X3 = sample(c(1:7, NA), 10, replace=TRUE),
                 YY = sample(c("a","b"), 10, replace=TRUE),
                 stringsAsFactors = FALSE)

> df
   X1 X2 X3 YY
1   3  5  5  a
2   3 NA  6  b
3   5  3  5  a
4   1  4  6  b
5   4  7  4  b
6   4  6  2  b
7   7  2  7  a
8   3  3 NA  b
9   5  3  5  b
10  2  6  3  a

最终结果是这样的：

YY   X1     X2    X3
 a  -0.25  -0.25  0
 b  -0.83  -0.2   0

每个百分比的公式是：

（{counts of c(6,7)-counts of c(1,2,3,4)）/ counts of c(1,2,3,4,5,6,7)。例如，要获取-0.5和X1的{{1}}：

我正在尝试通过每列（Where the columns is `X1` and `YY = a`, then: prom = counts of c(6,7) = 1 detr = counts of c(1,2,3,4) = 4 total = counts of c(1,2,3,4,5,6,7) = 6 The percentage is (prom - detr) / total = (1-4)/ 6 = -0.5）上的循环来实现该输出，对于每列：

X1,X2, and X3

求和分别为> table(df[,X1], df$YY) a b 1 0 1 2 1 0 3 1 2 4 0 2 5 1 1 7 1 0和a计数。但是我正在努力访问此b，并且对于每个table()，将各自的计数相加，休息，然后将它们除以计数总数。我本来想使用YY来访问表并按条件求和，但是我仍然没有办法。

有更简单的方法吗？任何想法？。我也尝试过使用dplyr，但是当我必须按类别分组并对每列进行计数，求和和除法并以较小的输出结尾时，这似乎更加复杂。

Answer 1

我们可以根据公式创建函数get_ratio

get_ratio <- function(x) {
  (sum(x %in% 6:7) - sum(x %in% 1:4))/sum(x %in% 1:7)
}

现在将其应用于每个组（YY）

library(dplyr)

df %>%
  group_by(YY) %>%
  summarise_at(vars(X1:X3), get_ratio)

#    YY       X1     X2    X3
#   <fct>    <dbl>  <dbl> <dbl>
#1    a     -0.5     -1     0
#2    b      0.25    -1    -1

Answer 2

按YY分组，然后使用指示的功能（以公式符号表示）汇总每个分组的列。

library(dplyr)

df %>%
  group_by(YY) %>%
  summarize_all(~ (sum(.x %in% 6:7) - sum(.x %in% 1:4)) / sum(.x %in% 1:7)) %>%
  ungroup

给予：

# A tibble: 2 x 4
  YY       X1    X2    X3
  <fct> <dbl> <dbl> <dbl>
1 a     -0.5     -1     0
2 b      0.25    -1    -1

Answer 3

您想做这样的事情吗？

 success <- tryCatch({
   remDr$navigate(html[i])
   TRUE
   }, 
   warning = function(w) { FALSE },
   error = function(e) { FALSE },
   finally = { })

if (!success) next

按类别分组，同时计数，求和和除法-R

3 个答案: