使用dplyr

时间:2019-12-03 03:02:55

标签: r dataframe dplyr

以下是我在dataframe中拥有的R的子集的示例,该子集具有有关按类别-rows,{{1 }} company_nameno_workers,product

address

是否有一种简单的方法可以向我的contact_person管道添加一个函数,将上面的comp_df <- structure(list(desc = c("AAA", "Company", "Ltd", "fish", "344", "12", "West", "Road", "Bob C", "BBB", "Enteprises", "vegetables", "12", "North", "Perak", "Simon T", "EF", "Industries", "cement", "8800", "Green", "Lane", "Singapore", "Sylvia P"), category = c("company_name", "company_name", "company_name", "product", "no_workers", "address", "address", "address", "contact_person", "company_name", "company_name", "product", "no_workers", "address", "address", "contact_person", "company_name", "company_name", "product", "no_workers", "address", "address", "address", "contact_person")), row.names = c(NA, -24L ), class = c("tbl_df", "tbl", "data.frame")) 转换为类似下面的内容

enter image description here

1 个答案:

答案 0 :(得分:2)

假设在原始数据框中的category列中,每个集合中company_name的第一个值标志着一个新组的开始,您可以这样做:

library(dplyr)
library(tidyr)

comp_df %>%
  group_by(category, grp = cumsum(category == "company_name" & lag(category, default = "") != "company_name")) %>%
  summarise(desc = paste(desc, collapse =  " ")) %>%
  pivot_wider(id_cols = grp, names_from = category, values_from = desc)

# A tibble: 3 x 6
    grp address              company_name    contact_person no_workers product   
  <int> <chr>                <chr>           <chr>          <chr>      <chr>     
1     1 12 West Road         AAA Company Ltd Bob C          344        fish      
2     2 North Perak          BBB Enteprises  Simon T        12         vegetables
3     3 Green Lane Singapore EF Industries   Sylvia P       8800       cement