如何在R中对列进行分组和合并

时间:2015-01-22 15:32:19

标签: r data.table

我有这个数据框:

d

structure(list(Product = structure(c(3L, 1L, 2L, 4L, 4L, 6L, 
4L, 5L), .Label = c("App_Servers ", "Db_servers,application ", 
"Server1,Serve2,Server4", "Server1,Serve2,Server4 ", "Server1,Serve2,Server4  ", 
"Server1,Serve2,Sever4 "), class = "factor"), Day = structure(c(3L, 
5L, 4L, 5L, 2L, 4L, 1L, 1L), .Label = c("Mon ", "Thu ", "Tue", 
"Tue ", "Wed "), class = "factor"), Date = structure(c(1L, 2L, 
3L, 4L, 5L, 6L, 7L, 7L), .Label = c(" 2015-01-06 ", "2015-01-07 ", 
"2015-01-13 ", "2015-01-14 ", "2015-01-15 ", "2015-01-20 ", "2015-02-16 "
), class = "factor"), Month = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 1L, 1L), .Label = c("Feb", "Jan"), class = "factor")), .Names = c("Product", 
"Day", "Date", "Month"), class = "data.frame", row.names = c(NA, 
-8L))

我需要能够将日期放在由逗号分隔的一个单元格中,这些日期按产品,日期和月份分组。例如,

Server1,Serve2,Server4将于1月份的2015-01-06,2015-01-14,2015-01-15,2015-01-20出现。

我的新df需要看起来像这样:

Product                Day  Date    Month  Day_list
Server1,Serve2,Server4 Tues 2015-01-06 Jan 2015-01-06,2015-01-13,2015-01-20 

任何可以帮助我在R?

中执行此操作的软件包

我尝试使用data.table包:

d[,d:=paste(Date,Date), c("Product","Day","Month")]

不工作

2 个答案:

答案 0 :(得分:2)

这里有几件事。

首先,您的列中包含其他空格。您必须将其删除才能将它们组合在一起。

require(data.table)
setDT(d)[, `:=`(Product = gsub("[ ]", "", Product),
                Date    = gsub("[ ]", "", Date))]

其次,您错误地使用了paste():=

d[, Date_list := paste(Date, collapse=","), by=c("Product", "Month")]
d
#                   Product  Day       Date Month                        Date_list
# 1: Server1,Serve2,Server4  Tue 2015-01-06   Jan 2015-01-06,2015-01-14,2015-01-15
# 2:            App_Servers Wed  2015-01-07   Jan                       2015-01-07
# 3: Db_servers,application Tue  2015-01-13   Jan                       2015-01-13
# 4: Server1,Serve2,Server4 Wed  2015-01-14   Jan 2015-01-06,2015-01-14,2015-01-15
# 5: Server1,Serve2,Server4 Thu  2015-01-15   Jan 2015-01-06,2015-01-14,2015-01-15
# 6:  Server1,Serve2,Sever4 Tue  2015-01-20   Jan                       2015-01-20
# 7: Server1,Serve2,Server4 Mon  2015-02-16   Feb            2015-02-16,2015-02-16
# 8: Server1,Serve2,Server4 Mon  2015-02-16   Feb            2015-02-16,2015-02-16

查看Introduction to data.tableReference semantics小插曲。

编辑:我刚刚意识到第6行有Product的拼写错误。它有Sever4而不是Server4

答案 1 :(得分:0)

以下是使用dplyr的一种解决方案:

 d %>% mutate(
  Product = gsub("[ ]", "", Product),
  Day = gsub("[ ] ", "", Day )
  ) %>%
  group_by(Product, Month) %>%
  mutate(
    Day_list = paste(Date, collapse = "")
    )

                 Product  Day         Date Month                           Day_list
1 Server1,Serve2,Server4  Tue  2015-01-06    Jan  2015-01-06 2015-01-14 2015-01-15 
2            App_Servers Wed   2015-01-07    Jan                        2015-01-07 
3 Db_servers,application Tue   2015-01-13    Jan                        2015-01-13 
4 Server1,Serve2,Server4 Wed   2015-01-14    Jan  2015-01-06 2015-01-14 2015-01-15 
5 Server1,Serve2,Server4 Thu   2015-01-15    Jan  2015-01-06 2015-01-14 2015-01-15 
6  Server1,Serve2,Sever4 Tue   2015-01-20    Jan                        2015-01-20 
7 Server1,Serve2,Server4 Mon   2015-02-16    Feb             2015-02-16 2015-02-16 
8 Server1,Serve2,Server4 Mon   2015-02-16    Feb             2015-02-16 2015-02-16