在表上运行binomial.tests并保存p值

时间:2017-09-18 08:53:23

标签: r statistics

              TF    data   TEs  open   cbp regregion           
           <chr>   <chr> <int> <int> <int>     <int>
 1           ALL control  5412   489   815      1272
 2        Caudal    chip  1188   136   115       278
 3           HSF    chip   712    74    59       191
 4        Dorsal    chip   490    34    30       155
 5 Tango (HIF1B)    chip  1145   132   107       269

...

我想在像上面一个(更大)的数据集中运行binomial.tests。我的预期频率在第一行。我已经提出了一个代码,可以对我要测试的每个变量进行测试(open,cbp,regregion)。但是,我无法将每个测试的p值存储在列中。

input_tests$open_c<-subset(input_tests, TF=='ALL')$open/5412
input_tests$cbp_c<-subset(input_tests, TF=='ALL')$cbp/5412
input_tests$regregion_c<-subset(input_tests, TF=='ALL')$regregion/5412

test <- function(x, n, p){binom.test(x, n, p, alternative="two.sided")}

???????

input_tests$results<-mapply(test, input_tests$open, input_tests$TEs, input_tests$open_c)

2 个答案:

答案 0 :(得分:1)

我推荐dplyr和扫帚来完成这类任务。 我不太了解您的数据或您是如何使用它的,所以我想出了自己的数据。

broom dplyr vignette非常好。

library(dplyr)
library(broom)

dat <- data.frame(age_group = c(1,2,3,4,5), 
                  cases = c(10, 5, 3,2, 0), 
                  n_participants = c(100,200, 300, 200, 100)
                  )

dat

  age_group cases n_participants
1         1    10            100
2         2     5            200
3         3     3            300
4         4     2            200
5         5     0            100

binom.test(x = dat$cases[1], n = dat$n_participants[1], p = 0.5, alternative = "two.sided")
    Exact binomial test

data:  dat$cases[1] and dat$n_participants[1]
number of successes = 10, number of trials = 100, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.5
95 percent confidence interval:
 0.04900469 0.17622260
sample estimates:
probability of success 
                   0.1 

dat2 <- dat %>% 
  group_by(age_group) %>%
  do(tidy(binom.test(.$cases, .$n_participants, alternative = "two.sided")))
dat2
# A tibble: 5 x 9
# Groups:   age_group [5]
  age_group estimate statistic      p.value parameter    conf.low  conf.high              method alternative
      <dbl>    <dbl>     <dbl>        <dbl>     <dbl>       <dbl>      <dbl>              <fctr>      <fctr>
1         1    0.100        10 3.063290e-17       100 0.049004689 0.17622260 Exact binomial test   two.sided
2         2    0.025         5 3.238045e-51       200 0.008166166 0.05737435 Exact binomial test   two.sided
3         3    0.010         3 4.418431e-84       300 0.002067007 0.02894451 Exact binomial test   two.sided
4         4    0.010         2 2.501777e-56       200 0.001213349 0.03565467 Exact binomial test   two.sided
5         5    0.000         0 1.577722e-30       100 0.000000000 0.03621669 Exact binomial test   two.sided

dat <- left_join(dat, select(dat2, age_group, p.value))
dat
  age_group cases n_participants      p.value
1         1    10            100 3.063290e-17
2         2     5            200 3.238045e-51
3         3     3            300 4.418431e-84
4         4     2            200 2.501777e-56
5         5     0            100 1.577722e-30

答案 1 :(得分:0)

test <- function(x, n, p){binom.test(x, n, p, alternative="two.sided")$p.value}
input_tests$open.p.value<-mapply(test, input_tests$open, input_tests$TEs, input_tests$open_c)
input_tests$cbp.p.value<-mapply(test, input_tests$cbp, input_tests$TEs, input_tests$cbp_c)
input_tests$regregion.p.value<-mapply(test, input_tests$regregion, input_tests$TEs, input_tests$regregion_c)

这对我有用。 我相信这里有很大的改进空间