使用dplyr将分类变量分配给组的更好方法是什么?

时间:2016-08-20 00:39:56

标签: r group-by filtering dplyr

我正在尝试根据另外两个计算的分类变量为组分配一个分类变量(是或否),这两个变量都包含“是”或“否”类别。如果一行对前两个计算变量都为yes,我希望将整个组分配为“是”。必须有一个更好的方法来使用过滤器或一些窗口等级函数。这是迄今为止我提出的凌乱的代码。 filteredDF是我希望获得的输出。谢谢!

#install.packages('nycflights13', 'dplyr')
library('nycflights13')
library('dplyr')
data(flights)

filteredDF <- flights %>%
  mutate(variable1 = ifelse(month %in% c(1:6) & day %in% c(16:28), yes = 'yes', no = 'no')) %>% #create first calculated categorical variable
  mutate(variable2 = ifelse(month %in% c(7:12, 6) & day %in% c(1:16) , yes = 'yes', no = 'no')) %>% #create second calculated categorical variable
  group_by(tailnum) %>% # assign groups I'm interested in
  mutate(varTogether = ifelse('yes' %in% variable1 & 'yes' %in% variable2, yes = 'yes', no = 'no')) %>% # create 3rd categorical to filter by (assigned by group)
  ungroup() %>%
  filter(varTogether == 'yes') # filter out what I want

1 个答案:

答案 0 :(得分:2)

我没有对此进行测试,但似乎更容易使用逻辑变量(TRUE / FALSE)而不是分类('yes' / 'no')变量。这不会缩短很多东西,但确实可以清理它们。

filteredDF <- flights %>%
  mutate(variable1 = month %in% 1:6  & day %in% 16:28,
         variable2 = month %in% 7:12 & day %in% 1:16) %>% 
  group_by(tailnum) %>% 
  mutate(varTogether = any(variable1) & any(variable2)) %>% 
  ungroup() %>%
  filter(varTogether)

(我假设c(7:12, 6)是个错误。另外,你真的希望两个变量的日期范围重叠吗?)

您可以通过省略中间变量来缩短它,但这可能不太可读。 (或者您可以定义函数vt <- function(month,day) any(...) & any(...)

filteredDF <- flights %>%
  group_by(tailnum) %>% 
  mutate(varTogether=any(month %in% 1:6  & day %in% 16:28) &
                     any(month %in% 7:12 & day %in% 1:16)) %>%
  ungroup() %>%
  filter(varTogether)