Question

在n_distinct文档中：

这是一种更快，更简洁的length（unique（x））

我尝试：

library(dplyr)

df <- data.frame(x = c(10, 4, 1, 6, 3, 1, 1), y = c(letters[1:7]))

length(unique(df$x))
#[1] 5

n_distinct(df$x)
#[1] 5

好吧。结果是相同的。

但是：

df%>%
  n_distinct(.$x)
#[1] 7

最后一个函数有什么问题？

Answer 1

当您这样做：

df %>% n_distinct(.$x)

您实际上正在做

n_distinct(df, df$x)

在这种情况下，它将返回df的不同行数。

如果删除列y，将产生与前两个操作相同的结果：

df[-2] %>% n_distinct(.$x)

结果：

5