Question

我有一个data.frame，如下所示。

> df2 <- data.frame("StudentId" = c(1,1,1,2,2,3,3), "Subject" = c("Maths", "Maths", "English","Maths", "English", "Science", "Science"), "Score" = c(100,90,80,70, 60,20,10))
> df2
  StudentId Subject Score
1         1   Maths   100
2         1   Maths    90
3         1 English    80
4         2   Maths    70
5         2 English    60
6         3 Science    20
7         3 Science    10

很少有StudentIds，列主题具有重复值（例如：ID 1有2个条目用于＆＃34;数学＆＃34;。我只需要保留重复行中的第一个。预期的data.frame是：

  StudentId Subject Score
1         1   Maths   100
3         1 English    80
4         2   Maths    70
5         2 English    60
6         3 Science    20

我无法做到这一点。任何想法。

Answer 1

我们可以在转换为＆＃39; data.table＆＃39;后使用unique的{{1}}和data.table选项。（by）

setDT(df2)

或来自＆＃39; df2＆＃39;的library(data.table) unique(setDT(df2), by = c("StudentId", "Subject")) # StudentId Subject Score #1: 1 Maths 100 #2: 1 English 80 #3: 2 Maths 70 #4: 2 English 60 #5: 3 Science 20

distinct

library(dplyr) distinct(df2, StudentId, Subject) # StudentId Subject Score # (dbl) (fctr) (dbl) #1 1 Maths 100 #2 1 English 80 #3 2 Maths 70 #4 2 English 60 #5 3 Science 20来自duplicated

base R

编辑：基于@David Arenburg的建议

过滤R data.frame中的重复行

1 个答案: