Question

我有医生索赔数据集，医生可以根据不同的专业提交索赔。我想找到每个医生提交的最常用的专业，并用他们最常用的专业替换所有专业价值。

physician <- c("Mary","Mary","Mary","Mary","Mary","Bob","Bob","Bob")
specialty <- c("GP","PED","DERM","ANES","GP","DERM","GP","DERM")
data <- as.data.frame(cbind(physician,specialty))

data
physician   specialty
Mary        GP
Mary        PED
Mary        DERM
Mary        ANES
Mary        GP
Bob         DERM
Bob         GP
Bob         DERM

我正在寻找一个不使用for loop输出以下内容的脚本：

data
physician   specialty
Mary        GP
Mary        GP
Mary        GP
Mary        GP
Mary        GP
Bob         DERM
Bob         DERM
Bob         DERM

实际的data.frame本身有更多的专栏和医生。

Answer 1

您可以使用tapply。它将数据分组并将功能应用于每个组。

physician_max <- tapply(data$specialty, data$physician,
                        function(s) {
                            counts <- table(s)
                            names(counts)[which.max(counts)]
                        })
data$specialty <- physician_max[data$physician]

Answer 2

t <- table(data) #summarise your data.frame
indicator <- apply(t, # apply on the table
                   1, # margin
                   function(x) names(which.max(x))) #find the name for the max value
data$specialty <- indicator[ data$physician] #assign the vector
data  # print your new df

R - 在不使用循环的情况下查找最大频率并替换值

2 个答案: