我有一个数据集:
data <- tribble(
~shop_name, ~category, ~NumberOfProducts,
"A", "Game", 50,
"A", "Book", 40,
"A", "Electronic", 30,
"B", "Home", 90,
"B", "Game", 100,
"B", "Electronic", 50,
"C", "Home", 60,
"C", "Book", 30,
"A", "Garden", 15,
"B", "Garden", 10,
)
但是现在,我想创建一个新的数据框,如下所示:
newdata <- tribble(
~shop_name, ~top_category,
"A", "Game, Book, Electronic",
"B", "Game, Home, Electronic",
"C", "Home, Book"
)
这意味着我想根据“ shop_name”对数据进行分组,然后要创建一个新变量(top_category),该变量根据“ NumberOfProducts”显示前三个类别。
我已经尝试对其进行编码。但是当我这样编码时,我只获得前三名:
data %>% top_n(3, NumberOfProducts)
有人会帮助我获取显示前三类的新数据吗?
答案 0 :(得分:2)
通过“对我的数据进行分组”,您将位于正确的位置。您可以使用group_by
来按商店应用top_n
功能。要将它们转换为一行,然后可以将summarize
与toString
data %>%
group_by(shop_name) %>%
top_n(3, NumberOfProducts) %>%
summarize(top_category = toString(category))
# A tibble: 3 x 2
# shop_name top_category
# <chr> <chr>
# 1 A Game, Book, Electronic
# 2 B Home, Game, Electronic
# 3 C Home, Book
答案 1 :(得分:1)
您可以针对shop_name
进行分组,然后使用summarise
并粘贴顶部类别:
data %>% group_by(shop_name) %>%
top_n(3, NumberOfProducts) %>%
arrange(-NumberOfProducts) %>%
summarise(top_category = paste(category, collapse = ", "))
# A tibble: 3 x 2
shop_name top_category
<chr> <chr>
1 A Game, Book, Electronic
2 B Game, Home, Electronic
3 C Home, Book