Question

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
df <- tibble(ID, Value)

我必须按ID进行分组，并保留最重复的Value的值如果值等于(ID == "B")，则选择第一个值

值变量应类似于：

Value_output <- c("blue", "blue", "blue", "red", "red", NA)

Answer 1

使用data.table软件包的解决方案（用Value计数ID）。

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)

library(data.table)
foo <- data.table(ID, Value)
setkey(foo, ID)
foo[foo[, .N, .(ID, Value)][order(N, decreasing = TRUE)][, .(Value = Value[1]), ID]]$i.Value
[1] "blue" "blue" "blue" "red"  "red"  NA

Answer 2

我们可以按组获取Playing

Mode

其中

library(dplyr)
df %>%
   group_by(ID) %>%
   arrange(ID, is.na(Value)) %>% # in case to keep non- NA elements for a tie
   mutate(Value_output = Mode(Value))

数据

 Mode <- function(x) {
   ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

Answer 3

使用基数R：

lot <- aggregate(
  Value ~ ID, 
  df, 
  function(x) names(sort(table(x), decreasing=TRUE))[1]
)
df$Value <- lot[match(df$ID, lot$ID), "Value"]
df
  ID    Value 
  <chr> <chr> 
1 A     blue  
2 A     blue  
3 A     blue  
4 B     orange
5 B     orange
6 c     NA

用最新值替换值

3 个答案:

数据