让我们说我有一个数据表,例如:
year city
2026 NYC
2026 NYC
2026 NYC
2026 LA
2027 LA
2028 NYC
2028 NYC
可以通过以下方式创建
:dt <- structure(list(location = c("NYC", "NYC", "NYC","LA", "LA", "NYC", "NYC"),
year = c(2026, 2026, 2026, 2026, 2027, 2028, 2028)),
class = "data.table",
row.names = c(NA, -7L))
我想计算给定年份中唯一城市的数量。让我们说2026
。
因此,在这种情况下的结果将是2,因为只有NYC
和LA
。
接下来的最后一行是什么?
dt %>%
filter(year == 2026) %>%
What goes here?
答案 0 :(得分:1)
我们可以使用n_distinct
来获取唯一值的数量
library(dplyr)
dt %>%
filter(year == 2026) %>%
summarise(count = n_distinct(city))
# count
#1 2
或者在摘要本身中添加过滤步骤
dt %>% summarise(count = n_distinct(city[year == 2026]))
或者,如果我们希望将其作为向量,则可以添加pull(count)
dt %>%
filter(year == 2026) %>%
summarise(count = n_distinct(city)) %>%
pull(count)
#[1] 2
在基数R中,这等效于
length(unique(dt$city[dt$year == 2026]))
#[1] 2
答案 1 :(得分:1)
我们可以使用data.table
library(data.table)
setDT(dt)[year == 2026, .(count = uniqueN(location))]
# count
#1: 2
或使用`base R
length(table(subset(dt, year == 2026, select = location)))
#[1] 2