Question

我正在寻找R中的模式函数，可以将其用于dplyr。我看过的两个帖子对“联系”的对待非常不同。 This post（肯·威廉姆斯）（Ken Williams）通过选择模式集合中第一个出现的值来对待关系。 This post通过注意同一单元格中的两个值来对待联系。

我正在寻找一个模式函数，将关系视为NA并排除缺失值。我使用Gregor's post将联系视为NA，但似乎无法排除缺少的值。

变量DF $ Color是字符类型。

这是DF的一个例子

Category<-c("A","B","B","C","A","A","A","B","C","B","C","C", "D", "D")
Color<-c("Red","Blue","Yellow","Blue","Green","Blue","Green","Yellow","Blue","Red","Red","Red","Yellow", NA)
DF<-data.frame(Category,Color)
DF <- arrange(DF, Category)
DF
DF$Color <- as.character(DF$Color)

包含NA，代码如下：

 mode <- function(x) {
  ux <- unique(x)
  tx <- tabulate(match(x, ux))
  if(length(unique(tx)) == 1) {
    return(NA)
  }
  max_tx <- tx == max(tx)
  return(ux[max_tx])
}

    DF %>%
      group_by(Category) %>%
      summarise(Mode = mode(Color))

我正在尝试找出不包含NA的代码。 df看起来像：

  Category Mode  
  <fct>    <fct> 
1 A        Green 
2 B        Yellow
3 C        NA    
4 D        Yellow

Answer 1

对函数的以下更改可确保根据输入返回正确的NA值类型，并且该函数可与长度为1的向量一起使用。

mode <- function(x) {
  ux <- unique(na.omit(x))
  tx <- tabulate(match(x, ux))
  if(length(ux) != 1 & sum(max(tx) == tx) > 1) {
    if (is.character(ux)) return(NA_character_) else return(NA_real_)
  }
  max_tx <- tx == max(tx)
  return(ux[max_tx])
}

DF %>%
  group_by(Category) %>%
  summarise(Mode = mode(Color))

# A tibble: 4 x 2
  Category Mode  
  <fct>    <chr> 
1 A        Green 
2 B        Yellow
3 C        NA    
4 D        Yellow

R无模式并排除NA

1 个答案: