我正在尝试总结一个数据框以创建两个摘要:
QUOT
或QUOG
出现的订单数QUOT
或QUOG
出现的数量以及还有其他Holds
出现的位置下面是代码的开头:
library(dplyr)
dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
test <- dat %>%
group_by(Order, Location) %>%
.....
我陷入了试图找出某个特定订单是否仅包含QUOT
或QUOG
,然后查找其是否包含QUOT
或QUOG
等问题的问题。
预期输出:
Location Only Multiple
1 Chicago 1 2
2 Charlotte 1 1
所以对于预期的输出:
QUOT
和另一个保留(ENGR
和VEND
),因此这将被视为 multiple < / strong>芝加哥QUOG
和另一个暂停(ENGR
),因此对于芝加哥 多个 来说, / li>
QUOT
,没有其他保留,因此对于芝加哥来说,这将被视为仅
QUOT
也没有QUOG
,因此该订单在计数中被 排除 QUOT
和另一个保留(CUST
),因此对于夏洛特来说,这将被视为 倍数 / li>
QUOT
,没有其他保留,因此对于夏洛特来说,这将被视为仅
答案 0 :(得分:3)
我认为这应该可行-您可能想通过其他一些命令进行测试:
library(dplyr)
library(tidyr)
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)
dat %>%
group_by(Order, Location) %>%
mutate(
quot_or_quog = Hold %in% c("QUOT", "QUOG"),
distinct_quot_or_quog = n_distinct(quot_or_quog)
) %>%
# Remove those that do not have "QUOT" or "QUOG"
filter(quot_or_quog) %>%
mutate(
label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
) %>%
group_by(label, add = TRUE) %>%
summarise(num_label = n_distinct(label)) %>%
group_by(Location, label) %>%
count(num_label) %>%
pivot_wider(
names_from = label,
values_from = n
) %>%
select(-num_label)
#> # A tibble: 2 x 3
#> # Groups: Location [2]
#> Location Multiple Only
#> <fct> <int> <int>
#> 1 Charlotte 1 1
#> 2 Chicago 2 1
由reprex package(v0.3.0)于2020-02-24创建
答案 1 :(得分:0)
这是使用dplyr
和tidyr
的另一种解决方案。这次先进行透视,然后再进行过滤和汇总以找到您的解决方案。
library(dplyr)
library(tidyr)
dat.summary <- dat %>%
mutate(hold_count = 1) %>%
pivot_wider(names_from = Hold, values_from = hold_count) %>%
mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>%
group_by(Location) %>%
summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))
dat.summary
给您
# A tibble: 2 x 3
Location only multiple
<fct> <dbl> <dbl>
1 Charlotte 1 1
2 Chicago 1 2
数据
dat <- data.frame(
Order = c(123,123,123,145,145,189,210,210,123,123,164),
Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)