旧问题，有点难以理解

Question

如何告诉R（dplyr）“重置”过滤器，这将允许我在同一管道内进行第二次过滤？否则，我将不得不为每个标识符编号编写一个“ for-loop”。最小的工作示例突出了我所面临的问题。

library(tidyverse)

data.tibble <- tribble(                      # sample data
  ~id,~year, ~identifier, ~items, ~cost,
  10, 2018, "aaca" , 10, 25, # "aaca" toy cars
  20, 2018, "aaca" , 12, 28, # "aaca" toy cars
  10, 2018, "bbda" , 14, 30, # "bbda" pens 
  20, 2018, "bbda" , 27, 29, # "bbda" pens
)

a <-data.tibble %>%                        # FIRST BLOCK WORKS FINE on its own 
  group_by(id, year) %>% 
  filter(str_detect(identifier, "^a")) %>% # lookks for identifier that begins
  summarise(toycars_sold=sum(items),       # with "a" 
            toycars_cost=sum(cost)) 
a 

b <- data.tibble %>%                       # Second block works fine on its own
  group_by(id, year) %>% 
  filter(str_detect(identifier,"^b")) %>% 
  summarise(pens_sold=sum(items),
            pens_cost=sum(cost))
b

我遇到麻烦，如果我要求dplyr再次过滤同一管道内的其他标识符，则会收到一条错误消息

data.tibble %>% 
  group_by(id, year) %>% 
  filter(str_detect(identifier, "^a")) %>% 
  summarise(toycars_sold=sum(items),
            toycars_cost=sum(cost)) %>% 
  filter(str_detect(identifier,"^b")) %>% 
  summarise(pens_sold=sum(items),
            pens_cost=sum(cost))


What i would like to end up with is

c <- full_join(a,b)

There are a myriad of codes ("identifiers") that I will have to go through ( sometimes there is more than one identifier for a single item.

R然后告诉我，找不到对象“标识符”。

我们非常感谢您的帮助。

旧问题，有点难以理解

我确实有一个问题，我似乎无法全神贯注。这是我的问题，在调用第一个summary（）函数之后，如何告诉tidyverse重置过滤器。否则，我将不得不为要过滤的每个“ id-code”（我相信正则表达式是正确的术语）创建一个“ for-loop”。

output <- vector("list") # object to store output in 

for (i in seq_along(object18)) { # object (list) to loop over, here items of stores in yr 18 
  output[[i]] <- object18[[i]] %>% 
    group_by(storeid, month, year, quarter) %>%  # var list to group over
    filter(str_detect(itemcode, "^CODE")) %>%   # Code equals some identifiernr ("string")
    summarize(toys=sum(items), # summarize
              max.items.sold=max(items)) # summarize %>%
    filter(str_detect(itemcode, "^NEWCODE, possibly multiple codes) %>% # FILTER OVER NEW CODE DOESN'T WORK
    summarize(toys2=sum(items), # summarize
             (itemstoy2=max(items)) # summarize 
}

有人对实现我的目标有想法吗？

请不要对我苛刻，我是R的新手。

提前谢谢戴维。

Answer 1

无法“回滚” filter并返回到管道中的原始未过滤数据。可能可以实现这种功能，但是，tidyverse中有更好的选择来实现相同的输出。

对于这种问题，我会：

定义一个自定义函数，该函数将data.frame和您的正则表达式过滤器（作为字符串）作为参数，并返回sold和costs的总和。
定义一个命名矢量，其中将商品名称作为名称，将正则表达式过滤器作为值。
将现有数据包装在tibble内的列表中，并与2中的向量进行交叉，然后将向量名称添加为新列。
将{1.}中定义的自定义函数应用于map2，以生成过滤后的数据集。
选择“（名称）名称”列，然后选择包含过滤数据且没有嵌套的列。

现在，您可以使用长格式的数据。对于许多任务而言，这已经是一种很好的格式。在最后一步中，您可以通过...将其设置为所需的格式。

...使用pivot_wider

如果您要过滤的不仅仅是正则表达式，则需要创建一个表达式列表（而不是字符向量），并使用此过滤器列表进行修饰。

library(tidyverse)

data.tibble <- tribble(                      # sample data
  ~id,~year, ~identifier, ~items, ~cost,
  10, 2018, "aaca" , 10, 25, # "aaca" toy cars
  20, 2018, "aaca" , 12, 28, # "aaca" toy cars
  10, 2018, "bbda" , 14, 30, # "bbda" pens 
  20, 2018, "bbda" , 27, 29, # "bbda" pens
)

sum_filter <- function(.df, .filter) {

  .df %>% 
    group_by(id, year) %>% 
    filter(str_detect(identifier, .filter)) %>%
    transmute(sold = sum(items),
              cost = sum(cost))

}

filter_vec <- c("toycars" = "^a",
                "pens" = "^b")

tibble(data = list(data.tibble)) %>%
  crossing(filters = filter_vec) %>% 
  mutate(name = names(filter_vec),
         filtered_data = map2(data, filters, sum_filter)) %>% 
  select(name, filtered_data) %>% 
  unnest(cols = filtered_data) %>% 
  pivot_wider(names_from = name,
              values_from = c(sold, cost))

#> # A tibble: 2 x 6
#>      id  year sold_toycars sold_pens cost_toycars cost_pens
#>   <dbl> <dbl>        <dbl>     <dbl>        <dbl>     <dbl>
#> 1    10  2018           10        14           25        30
#> 2    20  2018           12        27           28        29

^{由reprex package（v0.3.0）于2020-06-04创建}

R Tidyverse：在多个条件下过滤

旧问题，有点难以理解

1 个答案: