如何告诉R(dplyr)“重置”过滤器,这将允许我在同一管道内进行第二次过滤? 否则,我将不得不为每个标识符编号编写一个“ for-loop”。 最小的工作示例突出了我所面临的问题。
library(tidyverse)
data.tibble <- tribble( # sample data
~id,~year, ~identifier, ~items, ~cost,
10, 2018, "aaca" , 10, 25, # "aaca" toy cars
20, 2018, "aaca" , 12, 28, # "aaca" toy cars
10, 2018, "bbda" , 14, 30, # "bbda" pens
20, 2018, "bbda" , 27, 29, # "bbda" pens
)
a <-data.tibble %>% # FIRST BLOCK WORKS FINE on its own
group_by(id, year) %>%
filter(str_detect(identifier, "^a")) %>% # lookks for identifier that begins
summarise(toycars_sold=sum(items), # with "a"
toycars_cost=sum(cost))
a
b <- data.tibble %>% # Second block works fine on its own
group_by(id, year) %>%
filter(str_detect(identifier,"^b")) %>%
summarise(pens_sold=sum(items),
pens_cost=sum(cost))
b
我遇到麻烦,如果我要求dplyr再次过滤同一管道内的其他标识符,则会收到一条错误消息
data.tibble %>%
group_by(id, year) %>%
filter(str_detect(identifier, "^a")) %>%
summarise(toycars_sold=sum(items),
toycars_cost=sum(cost)) %>%
filter(str_detect(identifier,"^b")) %>%
summarise(pens_sold=sum(items),
pens_cost=sum(cost))
What i would like to end up with is
c <- full_join(a,b)
There are a myriad of codes ("identifiers") that I will have to go through ( sometimes there is more than one identifier for a single item.
R然后告诉我,找不到对象“标识符”。
我们非常感谢您的帮助。
我确实有一个问题,我似乎无法全神贯注。这是我的问题,在调用第一个summary()函数之后,如何告诉tidyverse重置过滤器。否则,我将不得不为要过滤的每个“ id-code”(我相信正则表达式是正确的术语)创建一个“ for-loop”。
output <- vector("list") # object to store output in
for (i in seq_along(object18)) { # object (list) to loop over, here items of stores in yr 18
output[[i]] <- object18[[i]] %>%
group_by(storeid, month, year, quarter) %>% # var list to group over
filter(str_detect(itemcode, "^CODE")) %>% # Code equals some identifiernr ("string")
summarize(toys=sum(items), # summarize
max.items.sold=max(items)) # summarize %>%
filter(str_detect(itemcode, "^NEWCODE, possibly multiple codes) %>% # FILTER OVER NEW CODE DOESN'T WORK
summarize(toys2=sum(items), # summarize
(itemstoy2=max(items)) # summarize
}
有人对实现我的目标有想法吗?
请不要对我苛刻,我是R的新手。
提前谢谢戴维。
答案 0 :(得分:0)
无法“回滚” filter
并返回到管道中的原始未过滤数据。可能可以实现这种功能,但是,tidyverse
中有更好的选择来实现相同的输出。
对于这种问题,我会:
定义一个自定义函数,该函数将data.frame
和您的正则表达式过滤器(作为字符串)作为参数,并返回sold
和costs
的总和。
定义一个命名矢量,其中将商品名称作为名称,将正则表达式过滤器作为值。
将现有数据包装在tibble
内的列表中,并与2中的向量进行交叉,然后将向量名称添加为新列。
将{1.}中定义的自定义函数应用于map2
,以生成过滤后的数据集。
选择“(名称)名称”列,然后选择包含过滤数据且没有嵌套的列。
现在,您可以使用长格式的数据。对于许多任务而言,这已经是一种很好的格式。在最后一步中,您可以通过...将其设置为所需的格式。
pivot_wider
如果您要过滤的不仅仅是正则表达式,则需要创建一个表达式列表(而不是字符向量),并使用此过滤器列表进行修饰。
library(tidyverse)
data.tibble <- tribble( # sample data
~id,~year, ~identifier, ~items, ~cost,
10, 2018, "aaca" , 10, 25, # "aaca" toy cars
20, 2018, "aaca" , 12, 28, # "aaca" toy cars
10, 2018, "bbda" , 14, 30, # "bbda" pens
20, 2018, "bbda" , 27, 29, # "bbda" pens
)
sum_filter <- function(.df, .filter) {
.df %>%
group_by(id, year) %>%
filter(str_detect(identifier, .filter)) %>%
transmute(sold = sum(items),
cost = sum(cost))
}
filter_vec <- c("toycars" = "^a",
"pens" = "^b")
tibble(data = list(data.tibble)) %>%
crossing(filters = filter_vec) %>%
mutate(name = names(filter_vec),
filtered_data = map2(data, filters, sum_filter)) %>%
select(name, filtered_data) %>%
unnest(cols = filtered_data) %>%
pivot_wider(names_from = name,
values_from = c(sold, cost))
#> # A tibble: 2 x 6
#> id year sold_toycars sold_pens cost_toycars cost_pens
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 10 2018 10 14 25 30
#> 2 20 2018 12 27 28 29
由reprex package(v0.3.0)于2020-06-04创建