我有一个数据框,我想删除所有带有重复项的行。例如,我的数据框看起来像:
> df <- data.frame(A = c("Happy", "Happy", "Sad", "Confused", "Mad", "Mad"), B = c(1, 2, 3, 4, 5, 6))
> df
A B
1 Happy 1
2 Happy 2
3 Sad 3
4 Confused 4
5 Mad 5
6 Mad 6
我只想要A中的条目唯一的行来获取:
A B
1 Sad 3
2 Confused 4
答案 0 :(得分:5)
akrun似乎正在收集不同的方法,所以这里是另一个基础:
df[ave(as.numeric(df$A), df$A, FUN = length) == 1,]
# A B
#3 Sad 3
#4 Confused 4
(我想那个duplicated
的那个是最常用的方法)
或者使用dplyr:
require(dplyr)
group_by(df, A) %>% filter(n() == 1)
答案 1 :(得分:4)
您可以尝试duplicated
df[!(duplicated(df$A)|duplicated(df$A,fromLast=TRUE)),]
# A B
#3 Sad 3
#4 Confused 4
或
df[df$A %in% with(as.data.frame(table(df$A)), Var1[Freq==1]),]
# A B
#3 Sad 3
#4 Confused 4
或
df[colSums(sapply(df$A, `==`, df$A))==1,]
# A B
#3 Sad 3
#4 Confused 4
或
df[colSums(Vectorize(function(x) x==df$A)(df$A))==1,]
或使用data.table
(类似于@ beginneR&#39; s使用ave
)
library(data.table)
setDT(df)[,.SD[.N==1], by=A]
# A B
#1: Sad 3
#2: Confused 4
或
setDT(df)[df[,.I[.N==1], by=A]$V1]
# A B
#1: Sad 3
#2: Confused 4