R:基于dyplr中的多个条件进行汇总

时间:2020-02-24 18:36:10

标签: r dplyr

我正在尝试总结一个数据框以创建两个摘要:

  1. 仅计算QUOTQUOG出现的订单数
  2. 计算订单QUOTQUOG出现的数量以及还有其他Holds出现的位置

下面是代码的开头:

library(dplyr)


dat <- data.frame(Order = c(123,123,123,145,145,189,210,210,123,123,164), 
                  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
                  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)


test <- dat %>%
  group_by(Order, Location) %>%

  .....

我陷入了试图找出某个特定订单是否仅包含QUOTQUOG,然后查找其是否包含QUOTQUOG等问题的问题。

预期输出:

   Location Only Multiple
1   Chicago    1        2
2 Charlotte    1        1

所以对于预期的输出:

  • 芝加哥的123号订单:其中有QUOT和另一个保留(ENGRVEND),因此这将被视为 multiple < / strong>芝加哥
  • 芝加哥的145号订单:其中有QUOG和另一个暂停(ENGR),因此对于芝加哥 多个 来说, / li>
  • 芝加哥189号命令:其中有QUOT,没有其他保留,因此对于芝加哥来说,这将被视为
  • 芝加哥的210号订单:既没有QUOT也没有QUOG,因此该订单在计数中被 排除
  • 夏洛特123号订单:其中有QUOT和另一个保留(CUST),因此对于夏洛特来说,这将被视为 倍数 / li>
  • 夏洛特164号命令:其中有QUOT,没有其他保留,因此对于夏洛特来说,这将被视为

2 个答案:

答案 0 :(得分:3)

我认为这应该可行-您可能想通过其他一些命令进行测试:

library(dplyr)
library(tidyr)

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)

dat %>% 
    group_by(Order, Location) %>% 
    mutate(
        quot_or_quog = Hold %in% c("QUOT", "QUOG"),
        distinct_quot_or_quog = n_distinct(quot_or_quog)
    ) %>% 
    # Remove those that do not have "QUOT" or "QUOG"
    filter(quot_or_quog) %>% 
    mutate(
        label = if_else(distinct_quot_or_quog == 1, "Only", "Multiple")
    ) %>% 
    group_by(label, add = TRUE) %>%
    summarise(num_label = n_distinct(label)) %>% 
    group_by(Location, label) %>%
    count(num_label) %>% 
    pivot_wider(
        names_from = label,
        values_from = n
    ) %>% 
    select(-num_label)
#> # A tibble: 2 x 3
#> # Groups:   Location [2]
#>   Location  Multiple  Only
#>   <fct>        <int> <int>
#> 1 Charlotte        1     1
#> 2 Chicago          2     1

reprex package(v0.3.0)于2020-02-24创建

答案 1 :(得分:0)

这是使用dplyrtidyr的另一种解决方案。这次先进行透视,然后再进行过滤和汇总以找到您的解决方案。

library(dplyr)
library(tidyr)

dat.summary <- dat %>%
  mutate(hold_count = 1) %>% 
  pivot_wider(names_from = Hold, values_from = hold_count) %>% 
  mutate(only = if_else((QUOT == 1 | QUOG == 1) & is.na(ENGR) & is.na(VEND) & is.na(CUST), 1, 0),
         multiple = if_else((QUOT == 1 | QUOG == 1) & (ENGR == 1 | VEND == 1 | CUST ==1), 1, 0)) %>% 
  group_by(Location) %>% 
  summarise(only = sum(only, na.rm = T), multiple = sum(multiple, na.rm = T))

dat.summary

给您

# A tibble: 2 x 3
  Location   only multiple
  <fct>     <dbl>    <dbl>
1 Charlotte     1        1
2 Chicago       1        2

数据

dat <- data.frame(
  Order = c(123,123,123,145,145,189,210,210,123,123,164), 
  Location = c("Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Chicago","Charlotte","Charlotte","Charlotte"),
  Hold = c("QUOT","ENGR","VEND","QUOG","ENGR","QUOT","ENGR","VEND","QUOT","CUST","QUOT")
)