我有一个data.frame,如下所示。
> df2 <- data.frame("StudentId" = c(1,1,1,2,2,3,3), "Subject" = c("Maths", "Maths", "English","Maths", "English", "Science", "Science"), "Score" = c(100,90,80,70, 60,20,10))
> df2
StudentId Subject Score
1 1 Maths 100
2 1 Maths 90
3 1 English 80
4 2 Maths 70
5 2 English 60
6 3 Science 20
7 3 Science 10
很少有StudentIds,列主题具有重复值(例如:ID 1有2个条目用于&#34;数学&#34;。我只需要保留重复行中的第一个。 预期的data.frame是:
StudentId Subject Score
1 1 Maths 100
3 1 English 80
4 2 Maths 70
5 2 English 60
6 3 Science 20
我无法做到这一点。 任何想法。
答案 0 :(得分:4)
我们可以在转换为&#39; data.table&#39;后使用unique
的{{1}}和data.table
选项。 (by
)
setDT(df2)
或来自&#39; df2&#39;的library(data.table)
unique(setDT(df2), by = c("StudentId", "Subject"))
# StudentId Subject Score
#1: 1 Maths 100
#2: 1 English 80
#3: 2 Maths 70
#4: 2 English 60
#5: 3 Science 20
distinct
library(dplyr)
distinct(df2, StudentId, Subject)
# StudentId Subject Score
# (dbl) (fctr) (dbl)
#1 1 Maths 100
#2 1 English 80
#3 2 Maths 70
#4 2 English 60
#5 3 Science 20
来自duplicated
base R
编辑:基于@David Arenburg的建议