原谅我这是我第一次在网上提问。
首先:设置一些数据以便于提问。
location <- c(1, 2, 3, 4)
numerator_estimate <- c(625, 180, 210, 1753)
numerator_variance <- c(22165, 2451, 11610, 172968)
denominator_estimate <- c(2278 , 4742, 1115, 26892)
denominator_variance <- c(15870, 688, 7172, 1908288)
my_df <-data.frame(location, numerator_estimate, numerator_variance, denominator_estimate, denominator_variance)
该函数根据分子和分母的估计和方差引导商的SE
calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, iterations = 10000){
numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
quotient_sim <- numerator_sim/denominator_sim
quotient_sim_se <- sd(quotient_sim)
return(quotient_sim_se)
}
此函数计算商,并包含以显示calculate_quotient_se不起作用,但另一个函数确实有效。
calculate_quotient <- function(numerator_estimate_f,denominator_estimate_f){
quotient <- numerator_estimate_f/denominator_estimate_f
}
my_df2 <- my_df %>%
mutate(quotient_se = calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000),
quotient = calculate_quotient(numerator_estimate, denominator_estimate))
my_df2
请注意quotient_se仅适用于第一行,而且每个附加行都会复制该值。
它也不会这样:
my_df$q_se <- calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000)
my_df
如果我按这样输入所有内容,它将起作用:
(x1 <- calculate_quotient_se(625, 22165, 2278, 15870))
(x2 <- calculate_quotient_se(180, 2451, 4742, 688))
(x3 <- calculate_quotient_se(210, 11610, 1115, 7172))
(x4 <- calculate_quotient_se(1753, 172968, 26892, 1908288))
有关如何在数据框中获取模拟SE以进行更多计算的任何建议?
答案 0 :(得分:0)
my_df$quotient_se <-
apply(my_df, 1, function(x) calculate_quotient_se(x[2], x[3], x[4], x[5]))
my_df$quotient <-
apply(my_df, 1, function(x) calculate_quotient(x[2],x[4]))
答案 1 :(得分:0)
如果你有一个没有矢量化的函数,你可以在purrr::pmap
的数据集的各行中应用它,它会在p
中并行迭代列表元素中的函数(在这种情况下,数据框)。由于您希望将其简化为数字向量,请使用pmap_dbl
版本:
library(tidyverse)
set.seed(47) # make sampling reproducible
my_df <- data_frame(location = c(1, 2, 3, 4),
numerator_estimate = c(625, 180, 210, 1753),
numerator_variance = c(22165, 2451, 11610, 172968),
denominator_estimate = c(2278 , 4742, 1115, 26892),
denominator_variance = c(15870, 688, 7172, 1908288))
calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f,
iterations = 10000){
numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
quotient_sim <- numerator_sim/denominator_sim
quotient_sim_se <- sd(quotient_sim)
return(quotient_sim_se)
}
my_df <- my_df %>% mutate(quotient_se = pmap_dbl(.[-1], calculate_quotient_se))
my_df %>% select(location, quotient_se)
#> # A tibble: 4 x 2
#> location quotient_se
#> <dbl> <dbl>
#> 1 1. 0.0684
#> 2 2. 0.0104
#> 3 3. 0.0993
#> 4 4. 0.0160
在这种情况下,.
代表管道输入的数据,[-1]
将丢弃location
,不应将其传递给该功能。
另一种选择是重新排列函数,以便它可以采用向量输入。在这种情况下,这可能意味着使用内部的矩阵。在规模上,这种方法几乎总是更快,但它可能暂时使用更多的内存来存储中间对象。