Question

我有以下格式的数据集-

Item Year
A    2018
B    2018
B    2019
A    2017
Z    2019

我仅使用以下方法从2018年开始选择项目：

library(dplyr)
data2 <- data %>% filter(Year == "2018")

现在，当我使用table（）获取项目计数时，出现了问题。输出看起来像-

table(data2$Item)

A B Z
1 1 0

我不明白为什么Z包含在这里。 data2中没有Z项。它弄乱了摘要统计信息。

有什么方法可以防止原始数据集中的项被包括在内？我尝试不使用dplyr过滤原始数据集，但table（）仍返回相同的输出。

Answer 1

如果“项目”为factor，则可能是未使用的级别。如果我们检查levels，它仍然存在

levels(data2$Item)
#[1] "A" "B" "Z"

我们可以使用droplevels

table(droplevels(data2$Item))
#  A B 
#1 1

或在.drop中指定count

library(dplyr)
data %>%
    filter(Year == "2018") %>% 
    count(Item, .drop = TRUE)
#  Item n
#1    A 1
#2    B 1

数据

data <- structure(list(Item = structure(c(1L, 2L, 2L, 1L, 3L), .Label = c("A", 
"B", "Z"), class = "factor"), Year = c(2018L, 2018L, 2019L, 2017L, 
2019L)), row.names = c(NA, -5L), class = "data.frame")

R-过滤数据集中的因子计数

1 个答案:

数据