R:合并重复的条目并有条件地按组选择日期

时间:2019-09-25 15:12:01

标签: r

我有一个包含数百个公司的数据集,其中的员工按公司id分组。对于某些公司,尽管employeestart日期不同,但同一stop有多个条目。

我想合并或删除重复的员工条目,同时保留两个开始日期中的较早日期和两个结束日期中的较晚日期。我的数据集如下:

df <- structure(list(id = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2),
  employee = c("culver", "maguire", "florenzano","cretu", "tran", "ryman",
  "menezes", "dancause", "schumaker", "tyler", "cretu", "tran", "menezes"),
  started = structure(c(15014, 15014, 15014, 15279, 15279, 15279, 15279, 15279, 15279, 15279, 15706, 15492, 15706), class = "Date"),
  ended = structure(c(18157, 15126, 15126, 15949, 15949, 15461, 15705, 15461, 15461, 15584, 18157,
  15706, 15876), class = "Date")), row.names = c(NA, -13L), class = c("tbl_df","tbl", "data.frame"), .Names = c("id", "employee", "started","ended"))

您可以看到“公司2”具有重复的Cretu,Tran和Menezes条目。最终数据集应如下所示:

df2 <- structure(list(id = c(1, 1, 1, 2, 2, 2, 2, 2, 2, 2),
  employee = c("culver", "maguire", "florenzano","cretu", "tran", "ryman",
  "menezes", "dancause", "schumaker", "tyler"),
  started = structure(c(15014, 15014, 15014, 15279, 15279, 15279, 15279, 15279, 15279, 15279), class = "Date"),
  ended = structure(c(18157, 15126, 15126, 18157, 15949, 15461, 15876, 15461, 15461, 15584), class = "Date")), row.names = c(NA, -13L), class = c("tbl_df","tbl", "data.frame"), .Names = c("id", "employee", "started","ended"))

我尝试了许多涉及mutatewhich.minwhich.max的解决方案,但均未成功。这里应该有一个整洁的解决方案,但我不知道。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

library(dplyr)
df %>% group_by(id, employee) %>% 
  summarise(started = min(started), ended = max(ended)) %>% 
  ungroup()

答案 1 :(得分:0)

做到了。来自@ slava-kohut的初始代码以及@IceCreamToucan的建议返回了正确的结果。谢谢你们的帮助。