Question

library(tidyverse) # library(dplyr) # would probably work, too.

给定数据框：

my_df <- data.frame(run = c(1,2,3,4),
                w_t = c(5.452595, 4.719883, 5.110823, 5.009686),
                L = c(4.212980, 4.674020, 3.849464, 3.971810), 
                mu = c(0.9962918, 1.0141293, 0.9962637, 0.9954),
                n = c(4,4,4,4))

请注意，这是实际数据集的一小部分（未显示更多列）。我想生成一些统计数据，并使用dplyr来执行此操作：

my_stats <- my_df %>% 
  ungroup() %>% 
  select(w_t, L, mu) %>%  
  summarise_each(funs(mean, sd, min, max))

这样可以产生一个带有12列命名格式的df：colname_stat。

我的问题：有没有办法将函数插入summarise_each，结果也包含95％的置信区间？即它看起来像：summarise_each(funs(mean, sd, min, max, blah))其中blah将是一个被调用的函数或我输入的等式。它可能是两部分，我需要输入一个等式为低，一个为鞋面等。

我已经创建了一个让我获得半宽置信区间的函数，但我还没弄清楚如何让它在语句的funs内工作。它看起来像这样：

my_ct <- function(s, n, ci){
  # you must enter the ci in decimal e.g. .95
  z_t <- qt( 1-(1-ci)/2, df = n-1)
  h <- z_t * s/sqrt(n)
  return(h)
}

我正在以这种方式安排数据进行比较，数据框为我提供了灵活的格式。

Answer 1

这个怎么样？无需通过n或s，您只需在函数中计算：

get_CI_half_width <- function(x, prob) {
  n <- length(x)
  z_t <- qt(1 - (1 - prob) / 2, df = n - 1)
  z_t * sd(x) / sqrt(n)
}

lower <- function(x, prob = 0.95) {
  mean(x) - get_CI_half_width(x, prob)
}

upper <- function(x, prob = 0.95) {
  mean(x) + get_CI_half_width(x, prob)
}

my_df %>% 
  ungroup() %>% 
  select(w_t, L, mu) %>%  
  summarise_all(funs(mean, sd, min, max, lower, upper))

给出：

  w_t_mean   L_mean  mu_mean   w_t_sd      L_sd       mu_sd  w_t_min    L_min mu_min  w_t_max   L_max   mu_max w_t_lower  L_lower
1 5.073247 4.177068 1.000521 0.302337 0.3640999 0.009081505 4.719883 3.849464 0.9954 5.452595 4.67402 1.014129  4.592161 3.597704
   mu_lower w_t_upper  L_upper mu_upper
1 0.9860705  5.554332 4.756433 1.014972

R：在`summarise_each`中使用'funs'生成置信区间：

1 个答案: