我有医生索赔数据集,医生可以根据不同的专业提交索赔。我想找到每个医生提交的最常用的专业,并用他们最常用的专业替换所有专业价值。
physician <- c("Mary","Mary","Mary","Mary","Mary","Bob","Bob","Bob")
specialty <- c("GP","PED","DERM","ANES","GP","DERM","GP","DERM")
data <- as.data.frame(cbind(physician,specialty))
data
physician specialty
Mary GP
Mary PED
Mary DERM
Mary ANES
Mary GP
Bob DERM
Bob GP
Bob DERM
我正在寻找一个不使用for loop
输出以下内容的脚本:
data
physician specialty
Mary GP
Mary GP
Mary GP
Mary GP
Mary GP
Bob DERM
Bob DERM
Bob DERM
实际的data.frame本身有更多的专栏和医生。
答案 0 :(得分:2)
您可以使用tapply。它将数据分组并将功能应用于每个组。
physician_max <- tapply(data$specialty, data$physician,
function(s) {
counts <- table(s)
names(counts)[which.max(counts)]
})
data$specialty <- physician_max[data$physician]
答案 1 :(得分:0)
t <- table(data) #summarise your data.frame
indicator <- apply(t, # apply on the table
1, # margin
function(x) names(which.max(x))) #find the name for the max value
data$specialty <- indicator[ data$physician] #assign the vector
data # print your new df