使用变分母在循环中的R中执行除法计算

时间:2015-02-13 18:28:12

标签: r loops statistics-bootstrap

我正在使用自举来获得95%CI,用于多个评估单位(EU)的平均计算。没有自举的计算是

欧盟流行率=总和(群集流行率)/群集数量

现在,问题的一个例子是,有些欧盟有25个集群,有些集群有30个。请帮我在循环中放入一些代码,以便在基于欧盟的计算中自动使用正确数量的集群代码和链接到另一个表 - 见第10行(我一直在想Excel中的VLOOKUP)。

"数据集"表按EU& Cluster分组,具有群集级别的流行度值。以下是它的外观示例:

eu    cluster    cluster_prev
640   1          0.23
640   2          0.78
...
640   25         0.78
678   1          0.97
...
678   27         1.2
681   1          0
...
681   31         0.78

然后有一个名为" cluster_count"的表。它按欧盟分组,有2列:EU& cluster_ct(欧盟中的集群数量)......这是我无法弄清楚如何合并的部分。以下是cluster_count的外观示例:

EU    cluster_ct
640   25
678   27
681   31

以下是代码:

#Load, transform data
dataset <- read.csv("ttprev_cluster.csv") 
str(dataset)
dataset$eu <- as.factor(dataset$EU)
dataset$cluster <- as.factor(dataset$CLUSTER)
dataset$cluster_prev <- dataset$adj_tt

#Boot statistic function 
clustermean <- function(df, i) {

    #this is the number that I want to replace with code
    num_clusters <- 25 

    r <- round(runif(num_clusters, 1, nrow(df)))

    df2 <- numeric()
    for (i in 1:num_clusters) 
        df2[i] <- df[r[i],]$cluster_prev

    return(mean(df2))  
}

#create empty data frame for results
bootResult <- data.frame(eu=character(), bootmean=numeric(), se=numeric(), ci95_low=numeric(), ci95_high=numeric(), stringsAsFactors=FALSE)

#Bootstrap function, looped over each EU
library(boot)
num_reps <- 10000 
for (i in 1:nlevels(dataset$eu)) {
    data2 <- subset(dataset, eu==levels(eu)[i])
    b <- boot(data2, clustermean, num_reps)
    m <- mean(b$t)
    se <- sd(b$t)

    #calculate 2.5/97.5 percentiles as Confidence Interval
    q <- quantile(b$t, c(0.025, 0.975))
    ci_lower <- q[1]
    ci_upper <- q[2]
}

2 个答案:

答案 0 :(得分:0)

首选方法是利用...参数 boot()。如:

#Boot statistic function
clustermean <- function(df, 
                        i,
                        num_clusters # num_clusters is now an artument to clustermean
                        ) {
    # blah blah blah
}


# blah blah blah

for (i in blahBlahBlah) {

    #calculate num_clusters here
    num_clusters <- cluster_count[cluster_count$EU == levels(eu)[i],
                                  'cluster_ct']

    b <- boot(data2, 
              clustermean, 
              num_reps,

              # additional arguments supplied to `boot()` that
              # don't match the formal arguments to boot 
              # are passed on to the 'statistic' function:

              # (note that you have to name this argument so 
              # it isn't matched positionally)

              num_clusters=num_clusters) # 


    # blah blah blah 

}

答案 1 :(得分:0)

一位不同的同事帮我解释了clustermean论证中的语法,我最终得到了以下内容(并且它有效!!):

num_clusters <- nrow(df)