Question

我正在使用R包nycflights13中的planes数据框。我试图选择在同一年中具有所有三种类型（固定多发，固定单发，旋翼飞机）的年份。我尝试创建一个子集：

subset(planes$year, planes$type == "Fixed wing multi engine" & 
planes$type == "Fixed wing single engine" & planes$type == "Rotorcraft")

并多次尝试使用dplyr：

    planes %>% filter(type == "Fixed wing multi engine" & 
type == "Fixed wing single engine" & type == "Rotorcraft")%>% group_by(year)

这没有用。我将如何做这样的事情？谢谢

Answer 1

请记住，subset和filter在行上进行操作。因此，一行不能包含所有3种类型。

一种方法是按年份分组，然后计算不同类型的数量。由于您事先知道有3种类型，因此您可以针对该计数进行过滤：

library(dplyr)

planes %>% 
  group_by(year) %>%
  filter(n_distinct(type) == 3)

这将返回26行。您可以使用count()或distinct()来表明那是1975年和1985年。

Answer 2

这是一条dplyr路线。关键是（a）至group_by()年第一，以及（b）使用n_distinct()函数。

planes %>% group_by(year) %>% filter(n_distinct(type) == 3)

注意：此代码隐式假定数据除“固定翼多引擎”，“固定翼单引擎”和“ Rotorcraft”外没有其他type。对于planes数据帧来说确实如此，但可能并非总是如此。最好使此假设明确，但会导致代码更长。

planes %>% 
group_by(year) %>% 
filter("Fixed wing multi engine" %in% type & 
       "Rotorcraft" %in% type & 
       "Fixed wing single engine" %in% type)

Answer 3

只需将其放到那里，这是使用ave()-

的基本R解决方案

n_types <- length(unique(planes$type))

unique(
  planes$year[ave(planes$type, planes$year, FUN = function(x) length(unique(x))) == n_types]
)

[1] 1985 1975

根据另一列的条件返回列值

3 个答案: