如何聚合r中两列的不同变量

时间:2018-04-20 11:58:45

标签: r

我的数据框看起来像这样 -

Numerator  Denominator  Proportion   StudyQaurter   NewPatGroup   Measure
120        320           0.37        1              A&B/A&B&C     ExposedDays/PatientDays   

以及Columns' PatGroup'中的许多这样的变量组合。和'变量'

我想要一个能让我从列#Pat;'中选择条目组合的功能。以及来自“变量”列的条目组合获得所需的输出。 例如,我想计算一个比例,该比例计算变量ExposedDays作为Numerator的PatGroups A和B的值之和;和变量ExposedDays和PatientDays作为分母的PatGroups A,B和C.

输出看起来像 -

{{1}}

有人可以帮我这个吗?

1 个答案:

答案 0 :(得分:1)

说实话,我不确定以你提议的方式汇总数据有什么意义,但你可以这样做:

library(tidyverse);
df %>%
    group_by(StudyQuarter) %>%
    summarise(
        Numerator = sum(Value[Variable == "ExposedDays" & PatGroup %in% c("A", "B")]),
        Denominator = sum(Value[Variable %in% c("ExposedDays", "PatientDays") & PatGroup %in% c("A", "B", "C")]),
        Proportion = Numerator / Denominator,
        NewPatGroup = "A&B/A&B&C",
        Measure = "ExposedDays/PatientDays")
## A tibble: 2 x 6
#  StudyQuarter Numerator Denominator Proportion NewPatGroup Measure
#         <int>     <int>       <int>      <dbl> <chr>       <chr>
#1            1       120         320      0.375 A&B/A&B&C   ExposedDays/Patien…
#2            2        90         110      0.818 A&B/A&B&C   ExposedDays/Patien…

样本数据

df <- read.table(text =
    "PatGroup     Variable       Value       StudyQuarter
A            PatientDays    100         1
B            ExposedDays    80          1
A            ExposedDays    40          1
A            Patients       40          1
C            ExposedDays    10          1
C            PatientDays    90          1
A            PatientDays    20          2
B            ExposedDays    90          2", header = T)