数据帧列表中的统计信息

时间:2019-10-31 17:24:32

标签: list dataframe statistics

我有两个数据框列表,对于ctrl患者为d $ 1,对于患病患者为d $ 2。每个df包含3位患者的微生物相对丰度:

List of 2
 $ CTRL  :'data.frame': 3 obs. of  18107 variables:
  ..$ Azorhizobium caulinodans                                           : num [1:3] 1.48e-07 1.62e-06 1.05e-06
  ..$ Buchnera aphidicola                                                : num [1:3] 9.63e-07 1.01e-06 8.09e-07
  ..$ Cellulomonas gilvus                                                : num [1:3] 1.63e-06 5.39e-07 4.05e-07
  ..$ Dictyoglomus thermophilum                                          : num [1:3] 2.30e-06 3.17e-06 1.34e-06
  ..$ Pelobacter carbinolicus                                            : num [1:3] 9.63e-07 3.70e-06 1.38e-06
  ..$ Shewanella colwelliana                                             : num [1:3] 9.63e-07 1.89e-06 1.62e-07
  ..$ Myxococcus fulvus                                                  : num [1:3] 1.78e-06 4.65e-06 1.50e-06
$ SICK:'data.frame':    3 obs. of  18107 variables:
  ..$ Azorhizobium caulinodans                                           : num [1:3] 4.24e-07 0.00 1.28e-06
  ..$ Buchnera aphidicola                                                : num [1:3] 5.45e-07 6.02e-07 4.47e-07
  ..$ Cellulomonas gilvus                                                : num [1:3] 3.03e-07 0.00 2.23e-07
  ..$ Dictyoglomus thermophilum                                          : num [1:3] 6.66e-07 2.75e-06 1.96e-06
  ..$ Pelobacter carbinolicus                                            : num [1:3] 9.69e-07 1.72e-07 1.62e-06
  ..$ Shewanella colwelliana                                             : num [1:3] 1.76e-06 6.02e-07 3.91e-07
  ..$ Myxococcus fulvus                                                  : num [1:3] 6.66e-07 8.60e-07 1.56e-06

我想计算每个分类单元的某些统计信息(CTRL与SICK),并将每个错误的结果另存为单独的df(results.mw)。我尝试过:

results.mw = lapply(mylist, function(d, l)
  {
  # Run wilcoxon by column
    as.data.frame(wilcox.test(d, l, exact = F)$p.value)
  }, d$"CTRL", l$"SICK")

但我遇到错误

Error in FUN(X[[i]], ...) : unused argument (l$SICK)

1 个答案:

答案 0 :(得分:0)

您需要遍历分类单元而不是包含两个数据帧的原始列表。在下面我略微编辑了代码,它应该执行成对测试。我对数据进行了模拟,使其具有与您所拥有的相似的东西。

# create data function
makeData = function(){
df = data.frame(matrix(rnorm(1000*3),3,1000))
colnames(df) = paste("S",1:1000,sep="_")
rownames(df) = letters[1:3]
return(df)
}
# create two data.frames
mylist = list(
       CTRL=makeData(),SICK=makeData()
)
# check 
str(mylist)
# although you said species are the same
# just to be sure
# we take intersection of species names
SPECIES = intersect(names(mylist$CTRL),names(mylist$CTRL))
# loop through species
p = sapply(SPECIES, function(i)
  {
  # Run wilcoxon by species
    wilcox.test(mylist$CTRL[,i],mylist$SICK[,i],exact=F)$p.value
  })
# gives you p-value by species
head(as.data.frame(p))