为dataframe中的一列创建list-column

时间:2018-01-25 19:07:03

标签: r list tidyr tidyverse

我有一个如下所示的数据集(我已经整理并分组了数据):

# A tibble: 2,903 x 5
# Groups: MarketName [201]
   MarketName                                  Season1Date                  x     y product   
   <chr>                                       <chr>                    <dbl> <dbl> <chr>     
 1 Abbotsford Farmers Market                   May to October           -90.3  44.9 Bakedgoods
 2 Amery Farmers Market                        June to October          -92.4  45.3 Bakedgoods
 3 Appleton Downtown Farm Market               06/18/2016 to 10/29/2016 -88.4  44.3 Bakedgoods
 4 Barker's Island Farmers Market              05/20/2017 to 10/28/2017 -92.1  46.7 Bakedgoods
 5 Black River Falls Community Farmers' Market 05/24/2014 to 10/25/2014 -90.8  44.3 Bakedgoods
 6 Black River Falls Downtown Farmer's Market  06/05/2014 to 09/25/2014 -90.9  44.3 Bakedgoods
 7 Boscobel Farmers Market                     05/09/2015 to 10/17/2015 -90.7  43.1 Bakedgoods
 8 Bristol Farmers Market                      June to October          -88.0  42.6 Bakedgoods
 9 Brookfield Farmers Market                   05/07/2016 to 10/29/2016 -88.1  43.1 Bakedgoods
10 Brown Deer Farmers Market                   06/14/2017 to 10/25/2017 -88.0  43.2 Bakedgoods
# ... with 2,893 more rows

每个市场的每种产品都有很多行。例如,我过滤掉了一个特定MarketName的数据并得到了这个:

# A tibble: 11 x 4
# Groups: MarketName [1]
   MarketName               x     y product   
   <chr>                <dbl> <dbl> <chr>     
 1 Amery Farmers Market -92.4  45.3 Bakedgoods
 2 Amery Farmers Market -92.4  45.3 Cheese    
 3 Amery Farmers Market -92.4  45.3 Flowers   
 4 Amery Farmers Market -92.4  45.3 Herbs     
 5 Amery Farmers Market -92.4  45.3 Vegetables
 6 Amery Farmers Market -92.4  45.3 Honey     
 7 Amery Farmers Market -92.4  45.3 Jams      
 8 Amery Farmers Market -92.4  45.3 Maple     
 9 Amery Farmers Market -92.4  45.3 Meat      
10 Amery Farmers Market -92.4  45.3 Plants    
11 Amery Farmers Market -92.4  45.3 Soap

我想知道,如何将产品列转换为列表列,以便每个市场只有一行产品列表。我想在最后得到这样的东西:

   MarketName               x     y product   
   <chr>                <dbl> <dbl> <chr>     
 1 Amery Farmers Market -92.4  45.3 Bakedgoods, Cheese, Flowers, Herbs ,etc.

1 个答案:

答案 0 :(得分:2)

重新创建数据:

product <- c("Baked Goods", "Cheese", "Flowers",
               "Herbs", "Vegetables", "Honey",
               "Jams", "Maple", "Meat", "Plants", "Soap")

df <- data.frame(MarketName = "Amery Farmers Market", x = -92.4, y = 45.3, product = product, stringsAsFactors = FALSE) %>% as_tibble()

Nest解决方案

如果你想要一个list-col:

,你可以尝试嵌套
df %>% nest(product)
# A tibble: 1 x 4
  MarketName               x     y data             
  <chr>                <dbl> <dbl> <list>           
1 Amery Farmers Market -92.4  45.3 <tibble [11 × 1]>

总结解决方案

或者总结一下你是否想要一个包含所有名字的字符串:

df %>%
  group_by(MarketName, x, y) %>%
  summarize(product = paste(product, collapse = ", "))
# A tibble: 1 x 4
# Groups:   MarketName, x [?]
  MarketName               x     y product                                                          
  <chr>                <dbl> <dbl> <chr>                                                            
1 Amery Farmers Market -92.4  45.3 Baked Goods, Cheese, Flowers, Herbs, Vegetables, Honey, Jams, Ma…