选择条件的前N个值

时间:2019-10-29 14:27:32

标签: r dataframe dplyr

我有一个数据集:

data <- tribble(
  ~shop_name,  ~category,       ~NumberOfProducts,
  "A",         "Game",            50,
  "A",         "Book",            40,
  "A",         "Electronic",      30,
  "B",         "Home",            90, 
  "B",         "Game",           100,
  "B",         "Electronic",      50,
  "C",         "Home",            60, 
  "C",         "Book",            30, 
  "A",         "Garden",          15,
  "B",         "Garden",          10,
)

但是现在,我想创建一个新的数据框,如下所示:

newdata <- tribble(
  ~shop_name,  ~top_category,
   "A",        "Game, Book, Electronic",  
   "B",        "Game, Home, Electronic",
   "C",        "Home, Book"
)

这意味着我想根据“ shop_name”对数据进行分组,然后要创建一个新变量(top_category),该变量根据“ NumberOfProducts”显示前三个类别。

我已经尝试对其进行编码。但是当我这样编码时,我只获得前三名:

data %>% top_n(3, NumberOfProducts)

有人会帮助我获取显示前三类的新数据吗?

2 个答案:

答案 0 :(得分:2)

通过“对我的数据进行分组”,您将位于正确的位置。您可以使用group_by来按商店应用top_n功能。要将它们转换为一行,然后可以将summarizetoString

一起使用
data %>% 
  group_by(shop_name) %>% 
  top_n(3, NumberOfProducts) %>% 
  summarize(top_category = toString(category))

# A tibble: 3 x 2
# shop_name top_category          
# <chr>     <chr>                 
# 1 A         Game, Book, Electronic
# 2 B         Home, Game, Electronic
# 3 C         Home, Book

答案 1 :(得分:1)

您可以针对shop_name进行分组,然后使用summarise并粘贴顶部类别:

data %>% group_by(shop_name) %>% 
 top_n(3, NumberOfProducts) %>%
 arrange(-NumberOfProducts) %>%
 summarise(top_category = paste(category, collapse = ", "))

# A tibble: 3 x 2
  shop_name top_category          
  <chr>     <chr>                 
1 A         Game, Book, Electronic
2 B         Game, Home, Electronic
3 C         Home, Book