从数据帧信息中提取的摘要

时间:2019-07-30 00:01:57

标签: r dplyr data.table

data <- 
STUDY ID  BASE  CYCLE1   DIED  PROG
  1    1    100    30     No    Yes
  1    2    NA     20     Yes   No
  1    3    16     NA     Yes   Yes 
  1    4    15     10     Yes   Yes

我想总结以下内容:

  1. 有多少受试者同时具有基线和CYCLE1值?
  2. 在1中,有多少人死了?
  3. 在1中,有多少人拥有DIED或PROG?

答案:

  1. 2个主题(占主题的50%)==>主题1和4
  2. 1科目(25%)===>这是主题4
  3. 2个主题(50%)==>主题1和4

Study提供的汇总表会很棒(显示数字和百分比)。 我正在使用Rstudio。

1 个答案:

答案 0 :(得分:1)

如果它基于第一个SELECT * FROM TABLE WHERE Time <= 123456789 and Time >= 0 LIMIT 10,000 OFFSET 10,0000

filter

如果我们也需要百分比

library(dplyr)
library(stringr)
data %>%
   group_by(STUDY) %>%
   filter(!is.na(BASE) & !is.na(CYCLE1)) %>%
   summarise(ID = str_c(ID, collapse=", "), 
             n1 = n(),
             n2 = sum(DIED== "Yes"), 
             n3 = sum(DIED == "Yes"|PROG == "Yes"))
# A tibble: 1 x 5
#  STUDY ID       n1    n2    n3
#  <int> <chr> <int> <int> <int>
#1     1 1, 4      2     1     2

可以进一步修改以格式化输出

out <- data %>% 
        group_by(STUDY) %>%
        mutate(i1 = !is.na(BASE) & !is.na(CYCLE1),
          perc1 = 100 * mean(i1), 
          n1 = sum(i1), 
          i2 = DIED == "Yes" & i1, 
          perc2 = 100 * mean(i2),
          n2 = sum(i2), 
          i3 = (DIED == "Yes"|PROG == "Yes") & i1, 
          perc3 = 100 * mean(i3), 
          n3 = sum(i3)) %>%
        filter(i1) %>% 
        select(STUDY, ID, matches("perc"), matches("n")) %>% 
        mutate(ID = toString(ID)) %>% 
        slice(1)
# A tibble: 1 x 8
# Groups:   STUDY [1]
#  STUDY ID    perc1 perc2 perc3    n1    n2    n3
#  <int> <chr> <dbl> <dbl> <dbl> <int> <int> <int>
#1     1 1, 4     50    25    50     2     1     2