ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
df <- tibble(ID, Value)
我必须按ID进行分组,并保留最重复的Value的值
如果值等于(ID == "B")
,则选择第一个值
值变量应类似于:
Value_output <- c("blue", "blue", "blue", "red", "red", NA)
答案 0 :(得分:2)
使用data.table
软件包的解决方案(用Value
计数ID
)。
ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
library(data.table)
foo <- data.table(ID, Value)
setkey(foo, ID)
foo[foo[, .N, .(ID, Value)][order(N, decreasing = TRUE)][, .(Value = Value[1]), ID]]$i.Value
[1] "blue" "blue" "blue" "red" "red" NA
答案 1 :(得分:2)
我们可以按组获取Playing
Mode
其中
library(dplyr)
df %>%
group_by(ID) %>%
arrange(ID, is.na(Value)) %>% # in case to keep non- NA elements for a tie
mutate(Value_output = Mode(Value))
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
答案 2 :(得分:2)
使用基数R:
lot <- aggregate(
Value ~ ID,
df,
function(x) names(sort(table(x), decreasing=TRUE))[1]
)
df$Value <- lot[match(df$ID, lot$ID), "Value"]
df
ID Value
<chr> <chr>
1 A blue
2 A blue
3 A blue
4 B orange
5 B orange
6 c NA