Question

我正在尝试根据日期类列对数据集进行采样，季度为“有效”，每月为“无效”

这是我的代码：

library(dplyr)
library(lubridate)
  
## data ##
                 
df <- structure(list( 
             mes = c("01/01/2000", "01/02/2000", "01/03/2000", 
"01/04/2000", "01/05/2000", "01/06/2000", "01/07/2000", "01/08/2000", 
"01/09/2000", "01/10/2000", "01/11/2000", "01/12/2000"),
              status = c("Active", "Inactive",
                         "Active", "Inactive",
                         "Active", "Inactive",
                         "Active", "Active",
                         "Inactive", "Active",
                         "Inactive", "Active")),
             class = "data.frame",
             row.names = c(NA, -12L))

## setting date class for "mes" column ##

df$mes <- as.Date(df$mes,
                  format = "%d/%m/%Y")

## sampling ##

sample_df <- df %>%  
  dplyr :: filter(status %in% "Active",
                  status %in% "Inactive") %>%
            dplyr :: filter_if(status == "Active",
            month(mes) %in% c(3,6,9,12),
            month(mes) %in% c(1,2,3,4,5,6,7,8,9,10,11,12))

控制台输出：

Error in is_logical(.p) : objeto 'status' no encontrado

还有其他可以用来完成此任务的库吗？

Answer 1

对于dplyr::filter，如果我们使用,，则意味着&，而我们需要|。使用&会导致0 rows，因为“状态”不能在同一位置同时具有“有效”和“无效”

df %>%  
  dplyr::filter(status %in% "Active"| status %in% "Inactive") %>% 
  dplyr::filter(status == 'Active', month(mes) %in% c(3, 6, 9, 12))

此外，当我们使用%in%时，它可以使用vector> = 1

取%in%值在运算符length的rhs中

 df %>%
    dplyr::filter(status %in% c("Active", "Inactive")) %>%      
    dplyr::filter(status == 'Active', month(mes) %in% c(3, 6, 9, 12))

在OP的过滤器语句中

...
 month(mes) %in% c(3,6,9,12),
        month(mes) %in% c(1,2,3,4,5,6,7,8,9,10,11,12)

暗示两个条件都应为真，但其中一个是另一个条件的子集

Answer 2

要过滤"Active"状态的季度月份和“无效”状态的所有月份，可以执行以下操作：

library(dplyr)

df %>%
  mutate(month = lubridate::month(mes)) %>%
  filter(status == "Active" & month %in% c(3,6,9,12) | 
         status == "Inactive" & month %in% 1:12)

#         mes   status month
#1 2000-02-01 Inactive     2
#2 2000-03-01   Active     3
#3 2000-04-01 Inactive     4
#4 2000-06-01 Inactive     6
#5 2000-09-01 Inactive     9
#6 2000-11-01 Inactive    11
#7 2000-12-01   Active    12

由于您希望所有月份都处于“非活动”状态，因此您也可以执行以下操作：

df %>%
  mutate(month = lubridate::month(mes)) %>%
  filter(status == "Active" & month %in% c(3,6,9,12) | 
         status == "Inactive")

如何根据R中的条件正确过滤df？

2 个答案: