我创建了以下数据集,因此我可以复制我的问题。我的模块/文件名是重复的。
owaspSample <- data.frame(Module=c("AccessDetails.java","AccessDiverse.java","BgField.java","BgStatus.java","CmdDate.java","CmdGameDate.java","CommentDate.java","CostDate.java","EntranceDetails.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ProvisioningDate.java","ReservationDate.java","RefDate.java","ServiceDate.java","StatusDate.java","ProfileDate.java","UpdateCmdDate.java","ViewDate.java","AccessDetails.java","AccessDiverse.java","AuthenticationDate.java","CmdDate.java","CmdSummaryDate.java","CmdViewDate.java","ChangeOrderDate.java","CommentDate.java","CostDate.java","GameDate.java","LdPopDate.java","LeaseCostDate.java","PastApprovalDate.java","ReservationDate.java","RefDate.java","UnderwaterCmdDate.java","WaveDate.java","XmlFormatter.java"),
Category = c("SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","SQL Injection","XML External Entity Injection"),
scanDate=c("2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-23","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24","2016-10-24"),
VulnCount = c("13","15"," 1"," 3","15"," 2","11","30"," 2"," 2"," 2"," 2"," 4"," 2"," 3"," 9"," 1"," 1"," 1"," 8"," 6","25","28"," 3","30"," 1"," 6"," 5","20","23"," 3"," 3"," 4","10"," 3","17"," 1"," 3"," 2"),
Owasp = c("A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A00-SQL Injection","A01-Injection"))
我执行以下操作以删除重复项,似乎可以正常工作。但是,我希望能够以最新日期返回副本。日期必须是动态的。
owaspSample <- owaspSample[!duplicated(owaspSample$Module),]
例如,如果你遇到这个:
Module Category Date VulnCount Owasp
CostDate.java SQL Injection 2016-10-23 30 A00-SQL Injection
EntranceDetails.java SQL Injection 2016-10-23 2 A00-SQL Injection
GameDate.java SQL Injection 2016-10-23 2 A00-SQL Injection
CostDate.java SQL Injection 2016-10-24 23 A00-SQL Injection
GameDate.java SQL Injection 2016-10-24 3 A00-SQL Injection
预期输出应为:
Module Category Date VulnCount Owasp
EntranceDetails.java SQL Injection 2016-10-23 2 A00-SQL Injection
CostDate.java SQL Injection 2016-10-24 23 A00-SQL Injection
GameDate.java SQL Injection 2016-10-24 3 A00-SQL Injection
任何想法如何做到这一点?
答案 0 :(得分:0)
我使用了nicola的建议并添加了另一段代码unique
,并且我没有丢失不重复的文件名。
owaspSample <- owaspSample[unique(owaspSample$Module),]
owaspSample <- owaspSample[!duplicated(owaspSample$Module, fromLast = TRUE),]
我想,他们做同样的事情。但是他们在一起给了我预期的结果。
答案 1 :(得分:0)
我们可以使用dplyr
执行此操作。按“模块”分组后,slice
每组中的最后一行
library(dplyr)
owaspSample %>%
group_by(Module) %>%
slice(n())