删除一列中的重复值,并返回另一列中的最新值

时间:2017-01-11 22:30:59

标签: r

我创建了以下数据集,因此我可以复制我的问题。我的模块/文件名是重复的。

owaspSample <- data.frame(Module=c("AccessDetails.java","AccessDiverse.java","BgField.java","BgStatus.java","CmdDate.java","CmdGameDate.java","CommentDate.java","CostDate.java","EntranceDetails.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ProvisioningDate.java","ReservationDate.java","RefDate.java","ServiceDate.java","StatusDate.java","ProfileDate.java","UpdateCmdDate.java","ViewDate.java","AccessDetails.java","AccessDiverse.java","AuthenticationDate.java","CmdDate.java","CmdSummaryDate.java","CmdViewDate.java","ChangeOrderDate.java","CommentDate.java","CostDate.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ReservationDate.java","RefDate.java","UnderwaterCmdDate.java","WaveDate.java","XmlFormatter.java"),
Category = c("SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","XML External Entity Injection"),
scanDate=c("2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24"),
VulnCount = c("13","15"," 1"," 3","15"," 2","11","30"," 2"," 2"," 2"," 2"," 4"," 2"," 3"," 9"," 1"," 1"," 1"," 8"," 6","25","28"," 3","30"," 1"," 6"," 5","20","23"," 3"," 3"," 4","10"," 3","17"," 1"," 3"," 2"),
Owasp = c("A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A01-Injection"))

我执行以下操作以删除重复项,似乎可以正常工作。但是,我希望能够以最新日期返回副本。日期必须是动态的。

owaspSample <- owaspSample[!duplicated(owaspSample$Module),]

例如,如果你遇到这个:

Module                  Category        Date        VulnCount   Owasp
CostDate.java           SQL Injection   2016-10-23      30      A00-SQL Injection
EntranceDetails.java    SQL Injection   2016-10-23      2       A00-SQL Injection
GameDate.java           SQL Injection   2016-10-23      2       A00-SQL Injection
CostDate.java           SQL Injection   2016-10-24      23      A00-SQL Injection
GameDate.java           SQL Injection   2016-10-24      3       A00-SQL Injection

预期输出应为:

Module                  Category        Date        VulnCount   Owasp
EntranceDetails.java    SQL Injection   2016-10-23      2       A00-SQL Injection
CostDate.java           SQL Injection   2016-10-24      23      A00-SQL Injection
GameDate.java           SQL Injection   2016-10-24      3       A00-SQL Injection

任何想法如何做到这一点?

2 个答案:

答案 0 :(得分:0)

我使用了nicola的建议并添加了另一段代码unique,并且我没有丢失不重复的文件名。

owaspSample <- owaspSample[unique(owaspSample$Module),]

owaspSample <- owaspSample[!duplicated(owaspSample$Module, fromLast = TRUE),]

我想,他们做同样的事情。但是他们在一起给了我预期的结果。

答案 1 :(得分:0)

我们可以使用dplyr执行此操作。按“模块”分组后,slice每组中的最后一行

library(dplyr)
owaspSample %>% 
         group_by(Module) %>%
         slice(n())