按年龄组在R中的数据框中添加频率计数

时间:2017-07-31 14:09:27

标签: r

我目前在数据框中有数据,如下所示:

AgeGroup Med1 Med2 Med3 Med4 Med5 ...

1     "A1" "A2" "C4" "D3" "E3" ...

3     "A2" "C5" "9"   "6"  "9" ...

2     "A1" "C2" "6"   "6"  "9" ...

1      "6"  "6" "A3"  "B4" "9" ...

2      ...........................

4      ...........................

thousand more rows  just like the above

有四个年龄组和20个med变量。 我想要的是按药物和年龄组计算每一行的数量。因此,最终结果将包含以下信息:

 AgeGroup  "A1" "A2"...."E1" "E2" ... "9" "6"

 1          4    6  .... 0     1  .... 40 20

 2          0    ...........................

 3          ................................

 4          ................................

我理解如何使用apply函数对行进行求和,但在这种情况下,我想根据年龄分组对所有行的多个频率求和。有办法简单地做到这一点吗?

1 个答案:

答案 0 :(得分:0)

基于您的格式的示例数据

df <- data.frame(AgeGroup=c(1,1,2),
             Med1=c("A1","A2","A1"),
             Med2=c("A2","C5","C2"),
             Med3=c("C4","9","6"),
             stringsAsFactors=F)

使用dpylrtidyr使用gather将您的数据转换为长格式,计算每个组中每个Med的出现次数,然后spread返回宽格式

library(dplyr)
library(tidyr)

# Change to 1:20 for your data (assuming all have prefix "Med"
thesecols <- sapply(1:3,function(x)paste0("Med",x))

df %>% 
  gather(key,value,thesecols) %>%
  group_by(AgeGroup, value) %>%
  summarise(count=sum(!is.na(value))) %>%
  spread(value, count)

输出

  AgeGroup   `6`   `9`    A1    A2    C2    C4    C5
1        1    NA     1     1     2    NA     1     1
2        2     1    NA     1    NA     1    NA    NA