我目前在数据框中有数据,如下所示:
AgeGroup Med1 Med2 Med3 Med4 Med5 ...
1 "A1" "A2" "C4" "D3" "E3" ...
3 "A2" "C5" "9" "6" "9" ...
2 "A1" "C2" "6" "6" "9" ...
1 "6" "6" "A3" "B4" "9" ...
2 ...........................
4 ...........................
thousand more rows just like the above
有四个年龄组和20个med变量。 我想要的是按药物和年龄组计算每一行的数量。因此,最终结果将包含以下信息:
AgeGroup "A1" "A2"...."E1" "E2" ... "9" "6"
1 4 6 .... 0 1 .... 40 20
2 0 ...........................
3 ................................
4 ................................
我理解如何使用apply函数对行进行求和,但在这种情况下,我想根据年龄分组对所有行的多个频率求和。有办法简单地做到这一点吗?
答案 0 :(得分:0)
基于您的格式的示例数据
df <- data.frame(AgeGroup=c(1,1,2),
Med1=c("A1","A2","A1"),
Med2=c("A2","C5","C2"),
Med3=c("C4","9","6"),
stringsAsFactors=F)
使用dpylr
和tidyr
使用gather
将您的数据转换为长格式,计算每个组中每个Med
的出现次数,然后spread
返回宽格式
library(dplyr)
library(tidyr)
# Change to 1:20 for your data (assuming all have prefix "Med"
thesecols <- sapply(1:3,function(x)paste0("Med",x))
df %>%
gather(key,value,thesecols) %>%
group_by(AgeGroup, value) %>%
summarise(count=sum(!is.na(value))) %>%
spread(value, count)
输出
AgeGroup `6` `9` A1 A2 C2 C4 C5
1 1 NA 1 1 2 NA 1 1
2 2 1 NA 1 NA 1 NA NA