为了计算效应大小并对连续结果的二分预测值进行元分析( d 或 g ),由均值组成的数据帧,sd和每项研究的样本量都是必需的。
我试图编写一些代码,用于从原始数据创建所需的数据框。这意味着不必为每项研究手动完成此过程。
示例原始数据集
Study <- c("andrew", "andrew", "andrew", "andrew", "peters", "peters", "peters", "jess", "jess", "jess")
Score = c(100, 308, 584, 241, 241, 111, 431, 123, 321, 411)
Sex = c(1, 1, 1, 2, 2, 1, 2, 2, 1, 1)
data = cbind(Score, Sex, Study)
data
> Score Sex Study
> [1,] "100" "1" "andrew"
> [2,] "308" "1" "andrew"
> [3,] "584" "1" "andrew"
> [4,] "241" "2" "andrew"
> [5,] "241" "2" "peters"
> [6,] "111" "1" "peters"
> [7,] "431" "2" "peters"
> [8,] "123" "2" "jess"
> [9,] "321" "1" "jess"
> [10,] "411" "1" "jess"
如何将 metafor 按性别和学习方式将数据转换为以下文件?
Study MeanMale MeanFemale SDMale SDfemale NrowsMale NrowsFemale
andrew X X X X X X
peters X X X X X X
jess X X X X X X
我认为使用describeBy,statsBy或Splitdata与sapply会起作用,但是将它变成所需的格式是混乱的。下一个目标是引入年级专栏,例如,
Study <- c("andrew", "andrew", "andrew", "andrew", "peters", "peters", "peters", "jess", "jess", "jess")
Score = c(100, 308, 584, 241, 241, 111, 431, 123, 321, 411)
Sex = c(1, 1, 1, 2, 2, 1, 2, 2, 1, 1)
Year = (1992, 1992, 1992, 1992, 1988, 1988, 1988, 1977, 1977, 1977)
data = cbind(Study, Year, Score, Sex)
生成以下data.frame
Study Year MeanMale MeanFemale SDMale SDfemale NrowsMale NrowsFemale
andrew 1992 X X X X X X
peters 1988 X X X X X X
jess 1977 X X X X X X
答案 0 :(得分:1)
我们可以使用setDT(data)
的开发版本,即v1.9.5。安装devel版本的说明是here
。
我们将'data.frame'转换为'data.table'(mean
),按'Sex'和'Study'分组,得到sd
,.N
和{ {1}}(nrows),并使用dcast
(来自data.table
,可以将多个value.var
列)从“长”格式转换为“宽”格式。
library(data.table)#v1.9.5+
dcast(setDT(data)[, list(Mean= mean(Score), SD= sd(Score), Nrows=.N),
.(Sex, Study)], Study~ c('Male', 'Female')[Sex],
value.var=c('Mean', 'SD', 'Nrows'))
# Study Female_Mean Male_Mean Female_SD Male_SD Female_Nrows Male_Nrows
#1: andrew 241 330.6667 NA 242.79484 1 3
#2: jess 123 366.0000 NA 63.63961 1 2
#3: peters 336 111.0000 134.3503 NA 2 1
来自@ Arun的评论,来自dcast
的{{1}}也接受了多项功能。
data.table
或者我们可以在使用dcast(setDT(data), Study ~ c('Male', 'Female')[Sex],
fun.agg=list(mean, sd, length), value.var="Score")
# Study Female_mean_Score Male_mean_Score Female_sd_Score Male_sd_Score
#1: andrew 241 330.6667 NA 242.79484
#2: jess 123 366.0000 NA 63.63961
#3: peters 336 111.0000 134.3503 NA
# Female_length_Score Male_length_Score
#1: 1 3
#2: 1 2
#3: 2 1
获取reshape
,base R
,mean
之后使用sd
中的nrow
。
aggregate
d1 <- do.call(data.frame,aggregate(Score~., transform(data, Sex=c('Male',
'Female')[Sex]), FUN=function(x) c(Mean=mean(x), SD=sd(x), Nrows=length(x))))
reshape(d1, idvar='Study', timevar='Sex', direction='wide')
# Study Score.Mean.Female Score.SD.Female Score.Nrows.Female Score.Mean.Male
#1 andrew 241 NA 1 330.6667
#3 jess 123 NA 1 366.0000
#5 peters 336 134.3503 2 111.0000
# Score.SD.Male Score.Nrows.Male
#1 242.79484 3
#3 63.63961 2
#5 NA 1
答案 1 :(得分:0)
这与dplyr和reshape2非常接近。我们将性别转换为命名因子,使用mutate按组获取SD和样本大小,然后融合并转换数据以获得具有良好变量名称的组的方法:
require(reshape2); require(dplyr)
data$Sex <- factor(data$Sex, levels = c(1, 2), labels = c('Male', 'Female'))
data <- mutate(group_by(data, Study), SD = sd(Score), Nrow = length(Score))
data <- melt(data, id.vars = c('Study', 'Sex'))
data$value <- as.numeric(data$value)
dcast(data, Study ~ variable + Sex, mean, na.rm = TRUE)