Question

让我们说我有一个数据表，例如：

year  city
2026  NYC
2026  NYC
2026  NYC
2026  LA
2027  LA
2028  NYC
2028  NYC

可以通过以下方式创建

：

dt <- structure(list(location = c("NYC", "NYC", "NYC","LA", "LA", "NYC", "NYC"), 
                     year = c(2026, 2026, 2026, 2026, 2027, 2028, 2028)),
                     class = "data.table", 
                     row.names = c(NA, -7L))

我想计算给定年份中唯一城市的数量。让我们说2026。因此，在这种情况下的结果将是2，因为只有NYC和LA。接下来的最后一行是什么？

dt %>% 
filter(year == 2026) %>%
What goes here?

Answer 1

我们可以使用n_distinct来获取唯一值的数量

library(dplyr)
dt %>%
  filter(year == 2026) %>%
  summarise(count = n_distinct(city)) 

#  count
#1     2

或者在摘要本身中添加过滤步骤

dt %>% summarise(count = n_distinct(city[year == 2026]))

或者，如果我们希望将其作为向量，则可以添加pull(count)

dt %>%
  filter(year == 2026) %>%
  summarise(count = n_distinct(city)) %>%
  pull(count)
#[1] 2

在基数R中，这等效于

length(unique(dt$city[dt$year == 2026]))
#[1] 2

Answer 2

我们可以使用data.table

library(data.table)
setDT(dt)[year == 2026, .(count = uniqueN(location))]
#   count
#1:     2

或使用`base R

length(table(subset(dt, year == 2026, select = location)))
#[1] 2

在R

2 个答案: