我有一个名为output的datframe output dataframe
我想为每个code
生成模式(最重复)patientID
,并为每个patientID
生成唯一code
的计数,并带有上面的zipcode
}。
我尝试过:
ddply(output,~zipcode,summarize,max=mode(code))
此代码将为每个不同的code
生成zipcode
模式...但是我想为不同的{{1}中的不同code
生成patientID
模式}。
zipcode
output=data.frame(code=c("E78.5","N08","E78.5","I65.29","Z68.29","D64.9"),patientID=c("34423","34423","34423","34423","34424","34425"),zipcode=c(00718,00718,00718,00718,00718,00719),city=c("NAGUABO","NAGUABO","NAGUABO","NAGUABO","NAGUABO","NAGUABO"))
答案 0 :(得分:0)
如果我正确理解您需要以code
和patientID
找到频率最高的zipcode
,那么可能会使用dplyr
。我认为您只需要将以上3列作为分组变量,然后使用summarise
来获取每个组的计数。每行最高的是模式。新列提供了模式计数。
# Your reprex data
output=data.frame(code=c("E78.5","N08","E78.5","I65.29","Z68.29","D64.9"),patientID=c("34423","34423","34423","34423","34424","34425"),zipcode=c(00718,00718,00718,00718,00718,00719),city=c("NAGUABO","NAGUABO","NAGUABO","NAGUABO","NAGUABO","NAGUABO"))
library(dplyr)
output %>%
dplyr::group_by(patientID, code, zipcode) %>%
dplyr::summarise(mode_freq = n())
# A tibble: 5 x 4
# Groups: patientID, code [5]
patientID code zipcode freq
<fct> <fct> <dbl> <int>
1 34423 E78.5 718 2
2 34423 I65.29 718 1
3 34423 N08 718 1
4 34424 Z68.29 718 1
5 34425 D64.9 719 1
我之所以包含dplyr::
是因为我假设您已经加载了plyr
,因此函数名将发生冲突。
更新:
要获得建议的模式输出,按照定义,它应该是最高频率:
output %>%
group_by(patientID, code, zipcode) %>%
summarise(mode_freq = n()) %>%
ungroup() %>%
group_by(zipcode) %>%
filter(mode_freq == max(mode_freq))
# A tibble: 2 x 4
# Groups: zipcode [2]
patientID code zipcode mode_freq
<fct> <fct> <dbl> <int>
1 34423 E78.5 718 2
2 34425 D64.9 719 1