通过因子计算索引组并使用ggplot进行绘制

时间:2019-05-09 16:15:36

标签: r ggplot2 confidence-interval

我想根据各种因素以其置信区间计算各种索引,并使用ggplot2在图表中显示。

在列1=positif0=negatif; "individual=1"中,表示有1个人被测试。 必须根据species+population+pathogen+dpi

计算以下索引

...     示例:AL: yu: dv: 21dpi infectrate =(2/3)*100; dissemrate = (2/2)*100; transrate = (2/2)*100; st=(220+100)/2 ##mean for the population, the pathogen and the dpi

     AL: ti  dv: 21dpi infectrate = (2/4)*100

infectrate = (number positif/number of individuals tested)*100;
dissemrate = (number positif$dissem/number positif$infect)*100;
transrate = (number positif$trans/number positif$dissem)*100;
strate = mean($st);

species population  individual  pathogen    dpi infect  dissem  trans   st
AL  yu  1   dv  21  1   1   1   220
AL  yu  2   dv  21  1   1   1   100
AL  yu  3   dv  21  0   0   0   0
AL  ti  1   dv  21  0   0   0   0
AL  ti  2   dv  21  1   1   1   60
AL  ti  3   dv  21  1   1   0   0
AL  ti  4   dv  21  0   0   0   0
AA  dla 1   dv  21  1   1   1   180
AA  dla 2   dv  21  1   1   0   0
AA  dla 3   dv  21  1   1   1   360
AL  yu  1   zk  21  0   0   0   0
AL  yu  2   zk  21  0   0   0   0
AA  mra 1   zk  14  1   1       
AA  mra 2   zk  14  1   1       
AA  yu  1   yv  21  0   0   0   0
AA  yu  2   yv  21  1   1   0   0
AL  bz  1   zk  14  1   1       
AL  bz  2   zk  14  1   1       

I've tried to use the dplyr package, but I didn't succeed.

...

当我计算代码时,它为索引的所有总体给出相同的值。

需要任何帮助,谢谢。

1 个答案:

答案 0 :(得分:0)

我不确定我是否完全理解这些计算。我认为这就是您要寻找的。

library(tidyverse)

df <-
  data.frame(stringsAsFactors=FALSE,
      species = c("AL", "AL", "AL", "AL", "AL", "AL", "AL", "AA", "AA", "AA",
                  "AL", "AL", "AA", "AA", "AA", "AA", "AL", "AL"),
   population = c("yu", "yu", "yu", "ti", "ti", "ti", "ti", "dla", "dla",
                  "dla", "yu", "yu", "mra", "mra", "yu", "yu", "bz", "bz"),
   individual = c(1, 2, 3, 1, 2, 3, 4, 1, 2, 3, 1, 2, 1, 2, 1, 2, 1, 2),
     pathogen = c("dv", "dv", "dv", "dv", "dv", "dv", "dv", "dv", "dv", "dv",
                  "zk", "zk", "zk", "zk", "yv", "yv", "zk", "zk"),
          dpi = c(21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 14, 14, 21,
                  21, 14, 14),
       infect = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1),
       dissem = c(1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1),
        trans = c(1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, NA, NA, 0, 0, NA, NA),
           st = c(220, 100, 0, 0, 60, 0, 0, 180, 0, 360, 0, 0, NA, NA, 0, 0,
                  NA, NA)
)

# infectrate = (number positif/number of individuals tested)*100;
# dissemrate = (number positif$dissem/number positif$infect)*100;
# transrate = (number positif$trans/number positif$dissem)*100;
# strate = mean($st);

df %>% 
  group_by(species, population, pathogen, dpi) %>% 
  summarise(
    infectrate = sum(infect)/n()*100,
    dissemrate = ifelse(infectrate == 0, 0, sum(dissem)/sum(infect)*100),
    transrate = ifelse(dissemrate == 0, 0, sum(trans)/sum(dissem)*100),
    strate = mean(st)
  ) %>% 
  ungroup()
#> df
# A tibble: 7 x 8
#  species population pathogen   dpi infectrate dissemrate transrate strate
#  <chr>   <chr>      <chr>    <dbl>      <dbl>      <dbl>     <dbl>  <dbl>
#1 AA      dla        dv          21      100          100      66.7   180 
#2 AA      mra        zk          14      100          100      NA      NA 
#3 AA      yu         yv          21       50          100       0       0 
#4 AL      bz         zk          14      100          100      NA      NA 
#5 AL      ti         dv          21       50          100      50      15 
#6 AL      yu         dv          21       66.7        100     100     107.
#7 AL      yu         zk          21        0            0       0       0