用最新值替换值

时间:2018-07-17 14:40:45

标签: r

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)
df <- tibble(ID, Value)

我必须按ID进行分组,并保留最重复的Value的值 如果值等于(ID == "B"),则选择第一个值

值变量应类似于:

Value_output <- c("blue", "blue", "blue", "red", "red", NA)

3 个答案:

答案 0 :(得分:2)

使用data.table软件包的解决方案(用Value计数ID)。

ID <- c("A", "A", "A", "B", "B", "c")
Value <- c("blue", "blue", "green", "red", "orange", NA)

library(data.table)
foo <- data.table(ID, Value)
setkey(foo, ID)
foo[foo[, .N, .(ID, Value)][order(N, decreasing = TRUE)][, .(Value = Value[1]), ID]]$i.Value
[1] "blue" "blue" "blue" "red"  "red"  NA    

答案 1 :(得分:2)

我们可以按组获取Playing

Mode

其中

library(dplyr)
df %>%
   group_by(ID) %>%
   arrange(ID, is.na(Value)) %>% # in case to keep non- NA elements for a tie
   mutate(Value_output = Mode(Value))

数据

 Mode <- function(x) {
   ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
 }

答案 2 :(得分:2)

使用基数R:

lot <- aggregate(
  Value ~ ID, 
  df, 
  function(x) names(sort(table(x), decreasing=TRUE))[1]
)
df$Value <- lot[match(df$ID, lot$ID), "Value"]
df
  ID    Value 
  <chr> <chr> 
1 A     blue  
2 A     blue  
3 A     blue  
4 B     orange
5 B     orange
6 c     NA