Question

有人知道在缺少值时是否可以计算R中的加权平均值，当缺少值时，现有值的权重是按比例向上缩放的？

为了清楚地传达这一点，我创造了一个假设的场景。这描述了问题的根源，其中需要针对每一行调整标量，具体取决于缺少哪些值。

File: Weighted Mean Calculation in Excel

Answer 1

发布示例数据集的最佳方式是使用dput(head(dat, 20))，其中dat是数据集的名称。图形图像是一个非常糟糕的选择。
数据。

dat <- structure(list(Test1 = c(90, NA, 81), Test2 = c(91, 79, NA), Test3 = c(92, 98, 83)), .Names = c("Test1", "Test2", "Test3" ), row.names = c("Mark", "Mike", "Nick"), class = "data.frame") w <- structure(list(Test1 = c(18, NA, 27), Test2 = c(36.4, 39.5, NA ), Test3 = c(36.8, 49, 55.3)), .Names = c("Test1", "Test2", "Test3" ), row.names = c("Mark", "Mike", "Nick"), class = "data.frame")

<强> CODE。
您可以在基础包weighted.mean和stats中使用函数sapply。请注意，如果注释和权重的数据集是类matrix的R对象，则不需要unlist。

sapply(seq_len(nrow(dat)), function(i){ weighted.mean(unlist(dat[i,]), unlist(w[i, ]), na.rm = TRUE) })

Answer 2

使用带有参数weighted.mean的基础stats包中的na.rm = TRUE可以获得所需的结果。这是tidyverse方式，可以这样做：

library(tidyverse)
scores <- tribble(
 ~student, ~test1, ~test2, ~test3,
   "Mark",     90,     91,     92,
   "Mike",     NA,     79,     98,
   "Nick",     81,     NA,     83)

weights <- tribble(
  ~test,   ~weight, 
  "test1",     0.2, 
  "test2",     0.4,
  "test3",     0.4)

scores %>% 
  gather(test, score, -student) %>%
  left_join(weights, by = "test") %>%
  group_by(student) %>%
  summarise(result = weighted.mean(score, weight, na.rm = TRUE))
#> # A tibble: 3 x 2
#>   student   result
#>     <chr>    <dbl>
#> 1    Mark 91.20000
#> 2    Mike 88.50000
#> 3    Nick 82.33333

R中的加权平均值计算与缺失值

2 个答案: