如何编写一个在拆分数据帧上使用摘要功能的循环?

时间:2019-07-04 00:54:10

标签: r loops dataframe statistics

我对R中的循环和函数不熟悉。

目标:我有一个数据框,希望根据一组特定的标识符将其拆分为多个数据框。然后,我想总结每个拆分的数据帧,并想使用循环来做到这一点。

我做的第一件事是我根据组织和基因分割数据框。例如:

Sb_qPCR <- split(Sb_qPCR, list(Sb_qPCR$Tissue, Sb_qPCR$Gene))

成功。我获得了一个列表,该列表根据独特的基因和组织将我的初始数据框分解为较小的数据框。

接下来,我想汇总每个单独的数据帧(这些拆分的数据帧是否仍称为数据帧?)并保存摘要输出。我使用在网上找到的summarySE函数成功地总结了每个单独的数据框。如果我单独调用它,则可以在每个单独的数据帧上使用summarySE函数。例如,下面是摘要功能:

summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=FALSE,
                  conf.interval=.95, .drop=TRUE){
library(plyr)

# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else       length(x)
}

# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
           .fun = function(xx, col) {
             c(N    = length2(xx[[col]], na.rm=na.rm),
               mean = mean   (xx[[col]], na.rm=na.rm),
               sd   = sd     (xx[[col]], na.rm=na.rm)
             )
           },
           measurevar
)

# Rename the "mean" column    
datac <- rename(datac, c("mean" = measurevar))

datac$se <- datac$sd / sqrt(datac$N)  # Calculate standard error of the mean

# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval: 
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult

return(datac)
} 

我能够执行以下功能:

RT.NHX2.sum <-summarySE(Sb_qPCR$RT.NHX2,measurevar="Ct",groupvars = c("Time","ID"))

因为我的拆分数据帧中有很多单独的数据帧,所以我想编写一个循环以在拆分数据帧的每个部分上运行summarySE函数,但无法弄清楚该怎么做。这可能吗?

编辑

以下是我正在处理的数据的示例:

   > dput(head(Sb_qPCR,5))
    structure(list(ID = structure(c(4L, 4L, 4L, 2L, 2L), .Label = c("Sb1_Control", 
    "Sb1_Salt", "Sb10_Control", "Sb10_Salt"), class = "factor"), 
    Genotype = structure(c(2L, 2L, 2L, 1L, 1L), .Label = c("Sb1", 
    "Sb10"), class = "factor"), Tissue = structure(c(2L, 2L, 
    2L, 2L, 2L), .Label = c("L2", "RT", "TL"), class = "factor"), 
    Time = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("t-0m", 
    "t-12h", "t-1h", "t-24h", "t-2h", "t-30m", "t-3d", "t-3h", 
    "t-6h"), class = "factor"), Gene = structure(c(3L, 3L, 3L, 
    3L, 3L), .Label = c("HKT3", "NHX2", "SOS1"), class = "factor"), 
    Treatment = structure(c(2L, 2L, 2L, 2L, 2L), .Label = c("Control", 
    "Salt"), class = "factor"), Ct = c(3.1334, 3.2518, 0.6313, 
    3.4878, 4.1403), Ct_Exp = c(3.1334, 3.2518, 0.6313, 3.4878, 
    4.1403), Ct_Cont = c(2.1029, 0.0167, 0.5565, 2.7948, 2.2366
    ), FC_raw = c(0.4895, 0.1062, 0.9495, 0.6186, 0.2673), FC_graph = c(-2.0427, 
    -9.4155, -1.0532, -1.6166, -3.7416)), row.names = c(NA, 5L
    ), class = "data.frame")

我采取的第一步是拆分数据帧。我这样做的方式是:

  Sb_qPCR <- split(Sb_qPCR, list(Sb_qPCR$Tissue, Sb_qPCR$Gene))

这就是我的拆分数据帧:

  > names(Sb_qPCR)
   [1] "L2.HKT3" "RT.HKT3" "TL.HKT3" "L2.NHX2" "RT.NHX2" "TL.NHX2"    "L2.SOS1" "RT.SOS1"
   [9] "TL.SOS1"

接下来,我想使用上面显示的summarySE函数来汇总每个数据帧,即L2.HKT3,并将输出另存为新的数据帧。我可以通过以下代码来实现:

    RT_SOS1.sum <-summarySE(Sb_qPCR$RT.SOS1,measurevar="Ct",groupvars = c("Time","ID"))

如何将其合并到一个循环中,在该循环中它将调用每个数据帧并将其保存到新数据帧?

0 个答案:

没有答案