计算数据集中一组变量的3组之间的效果大小

时间:2017-02-14 03:54:47

标签: r statistics data-manipulation

我想计算3种治疗对3个变量(x1,x2,x3)的影响大小。假设我有以下数据集:

set.seed(1234)

data <- data.frame(
  dose=factor(c(rep(1,25), rep(2,35), rep(3,40)), 
         labels = c("low", "middle", "high")),
  x1 = rnorm(100, 0, 2),
  x2 = rnorm(100, 3, 3),
  x3 = rnorm(100, 9, 4)
)

现在,我想计算每种治疗组合的效果大小。我找到了这个函数来计算科恩的d。

cohens_d <- function(x, y) {
  lx <- length(x)- 1
  ly <- length(y)- 1
  md  <- abs(mean(x) - mean(y))
  csd <- lx * var(x) + ly * var(y)
  csd <- csd/(lx + ly)
  csd <- sqrt(csd)

  cd  <- md/csd
  # Hedges'g 
  cd*(1-(3/(4*(length(x)+length(y)-9))))
  #print(cd)

}

非常感谢你的帮助。

编辑:

例如,下面我可以在一个变量x1中计算三种处理(成对)的效果大小。理想情况下,我想要一种通用的方法来对我的数据集中的所有变量进行这些成对比较。

cohens_d(data$x1[data$dose=="low"], data$x1[data$dose=="middle"])
cohens_d(data$x1[data$dose=="low"], data$x1[data$dose=="high"])
cohens_d(data$x1[data$dose=="middle"], data$x1[data$dose=="high"])

1 个答案:

答案 0 :(得分:2)

df1$dose <- as.character(df1$dose)  # convert dose from factor to character
selected_cols <- colnames( df1 )[2:4]  # select columns prefixed with 'x'

library("reshape2")  # load reshape2 library
df1 <- melt( data = df1, id = "dose", measure.vars =selected_cols , value.name = 'value')  # melt df1 data frame

# compute cohensD    
cohens_df1 <- with(df1, sapply( selected_cols, # loop through column names
                                function( x ) combn( unique(dose), 2 ,  # loop through pairs of dose combinations
                                                     function( y ) cohens_d( df1[ variable %in% x & dose %in% y[1], 'value' ], 
                                                                             df1[ variable %in% x & dose %in% y[2], 'value' ] ))))

# assign row names 
rownames(cohens_df1) <- combn( unique(df1$dose), 2 , function( y ) paste( y, collapse = '_' ) )
cohens_df1
#                    x1         x2          x3
# low_middle  0.3319591 0.09511378 0.321519422
# low_high    0.4982017 0.03265765 0.337651450
# middle_high 0.8221889 0.10799662 0.006570862

数据:

set.seed(1234)    
df1 <- data.frame( dose = factor(c(rep(1,25), rep(2,35), rep(3,40)), levels = c(1, 2, 3), labels = c("low", "middle", "high")),
                   x1 = rnorm(100, 0, 2),
                   x2 = rnorm(100, 3, 3),
                   x3 = rnorm(100, 9, 4))