如何通过过滤将数据汇总?

时间:2020-02-11 20:42:30

标签: r dplyr tidyverse

我有一个类似这样的数据集:

df_A <- tribble(
  ~product_name,    ~position,   ~cat_id,  ~pr,
        "A",             1,          1,    "X", 
        "A",             4,          2,    "X",
        "A",             3,          3,    "X",
        "B",             4,          5,     NA,
        "B",             6,          6,     NA,
        "C",             3,          1,    "Y",
        "C",             5,          2,    "Y",
        "D",             6,          2,    "Z",
        "D",             4,          8,    "Z",
        "D",             3,          9,    "Z",
)

现在,我想在cat_id中查找 1 2 ,并为每个{{ 1}}。如果position中没有 1 2 ,则只有这三个变量将返回给NA。请查看我想要的数据集以更好地了解:

product_name

我如何得到它?

1 个答案:

答案 0 :(得分:1)

我们可以filter基于'cat_id'的行,然后如果缺少某些'product_name',请使用complete扩展数据集,并使用pivot_wider整形为“宽”格式

library(dplyr)
library(tidyr)
library(stringr)
df_A %>%
   filter(cat_id %in% 1:2) %>% 
   mutate(cat_id = str_c('position_', cat_id)) %>%
   complete(product_name = unique(df_A$product_name)) %>%        
   pivot_wider(names_from = cat_id, values_from = position) %>%
   select(-`NA`)
# A tibble: 4 x 4
#  product_name pr    position_1 position_2
#  <chr>        <chr>      <dbl>      <dbl>
#1 A            X              1          4
#2 B            <NA>          NA         NA
#3 C            Y              3          5
#4 D            Z             NA          6

或使用reshape/subset中的base R

reshape(merge(data.frame(product_name = unique(df_A$product_name)), 
   subset(df_A, cat_id %in% 1:2), all.x = TRUE), 
   idvar = c('product_name', 'pr'), direction = 'wide', timevar = 'cat_id')[-5]