R - 在不使用循环的情况下查找最大频率并替换值

时间:2016-08-09 19:08:31

标签: r loops for-loop replace max

我有医生索赔数据集,医生可以根据不同的专业提交索赔。我想找到每个医生提交的最常用的专业,并用他们最常用的专业替换所有专业价值。

physician <- c("Mary","Mary","Mary","Mary","Mary","Bob","Bob","Bob")
specialty <- c("GP","PED","DERM","ANES","GP","DERM","GP","DERM")
data <- as.data.frame(cbind(physician,specialty))

data
physician   specialty
Mary        GP
Mary        PED
Mary        DERM
Mary        ANES
Mary        GP
Bob         DERM
Bob         GP
Bob         DERM

我正在寻找一个不使用for loop输出以下内容的脚本:

data
physician   specialty
Mary        GP
Mary        GP
Mary        GP
Mary        GP
Mary        GP
Bob         DERM
Bob         DERM
Bob         DERM

实际的data.frame本身有更多的专栏和医生。

2 个答案:

答案 0 :(得分:2)

您可以使用tapply。它将数据分组并将功能应用于每个组。

physician_max <- tapply(data$specialty, data$physician,
                        function(s) {
                            counts <- table(s)
                            names(counts)[which.max(counts)]
                        })
data$specialty <- physician_max[data$physician]

答案 1 :(得分:0)

t <- table(data) #summarise your data.frame
indicator <- apply(t, # apply on the table
                   1, # margin
                   function(x) names(which.max(x))) #find the name for the max value
data$specialty <- indicator[ data$physician] #assign the vector
data  # print your new df