范围和功能仅为第一行数据提供结果

时间:2018-03-06 00:03:55

标签: r function dplyr scoping mutate

原谅我这是我第一次在网上提问。

首先:设置一些数据以便于提问。

location <- c(1, 2, 3, 4)
numerator_estimate <- c(625, 180, 210, 1753)
numerator_variance <- c(22165, 2451, 11610, 172968)
denominator_estimate <- c(2278 , 4742, 1115, 26892)
denominator_variance <- c(15870, 688, 7172, 1908288)
my_df <-data.frame(location, numerator_estimate, numerator_variance, denominator_estimate, denominator_variance) 

该函数根据分子和分母的估计和方差引导商的SE

calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, iterations = 10000){
  numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
  denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
  quotient_sim <- numerator_sim/denominator_sim
  quotient_sim_se <- sd(quotient_sim)
  return(quotient_sim_se)
}

此函数计算商,并包含以显示calculate_quotient_se不起作用,但另一个函数确实有效。

calculate_quotient <- function(numerator_estimate_f,denominator_estimate_f){
  quotient <- numerator_estimate_f/denominator_estimate_f
}

my_df2 <- my_df %>%
  mutate(quotient_se = calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000),
         quotient = calculate_quotient(numerator_estimate, denominator_estimate))
my_df2 

请注意quotient_se仅适用于第一行,而且每个附加行都会复制该值。

它也不会这样:

my_df$q_se <- calculate_quotient_se(numerator_estimate, numerator_variance, denominator_estimate, denominator_variance, iterations = 10000)
my_df

如果我按这样输入所有内容,它将起作用:

(x1 <- calculate_quotient_se(625, 22165, 2278, 15870))
(x2 <- calculate_quotient_se(180, 2451, 4742, 688))
(x3 <- calculate_quotient_se(210, 11610, 1115, 7172))
(x4 <- calculate_quotient_se(1753, 172968, 26892, 1908288))

有关如何在数据框中获取模拟SE以进行更多计算的任何建议?

2 个答案:

答案 0 :(得分:0)

my_df$quotient_se <- 
    apply(my_df, 1, function(x) calculate_quotient_se(x[2], x[3], x[4], x[5]))

my_df$quotient <- 
    apply(my_df, 1, function(x) calculate_quotient(x[2],x[4]))

答案 1 :(得分:0)

如果你有一个没有矢量化的函数,你可以在purrr::pmap的数据集的各行中应用它,它会在p中并行迭代列表元素中的函数(在这种情况下,数据框)。由于您希望将其简化为数字向量,请使用pmap_dbl版本:

library(tidyverse)
set.seed(47)    # make sampling reproducible

my_df <- data_frame(location = c(1, 2, 3, 4),
                    numerator_estimate = c(625, 180, 210, 1753),
                    numerator_variance = c(22165, 2451, 11610, 172968),
                    denominator_estimate = c(2278 , 4742, 1115, 26892),
                    denominator_variance = c(15870, 688, 7172, 1908288))

calculate_quotient_se <- function(numerator_estimate_f, numerator_variance_f, denominator_estimate_f, denominator_variance_f, 
                                  iterations = 10000){
    numerator_sim <- rnorm(n = iterations, mean = numerator_estimate_f, sd = sqrt(numerator_variance_f))
    denominator_sim <- rnorm(n = iterations, mean = denominator_estimate_f, sd = sqrt(denominator_variance_f))
    quotient_sim <- numerator_sim/denominator_sim
    quotient_sim_se <- sd(quotient_sim)
    return(quotient_sim_se)
}

my_df <- my_df %>% mutate(quotient_se = pmap_dbl(.[-1], calculate_quotient_se))

my_df %>% select(location, quotient_se)
#> # A tibble: 4 x 2
#>   location quotient_se
#>      <dbl>       <dbl>
#> 1       1.      0.0684
#> 2       2.      0.0104
#> 3       3.      0.0993
#> 4       4.      0.0160

在这种情况下,.代表管道输入的数据,[-1]将丢弃location,不应将其传递给该功能。

另一种选择是重新排列函数,以便它可以采用向量输入。在这种情况下,这可能意味着使用内部的矩阵。在规模上,这种方法几乎总是更快,但它可能暂时使用更多的内存来存储中间对象。