我想删除不会发生n
次的行。因此,如果我的因子变量发生!= n
次,我想删除所有因子变量。
示例数据:
df <- data.frame(
my_factor = factor(rep(1:24, each = 10)),
value = runif(240, min = -10, max = 125)
)
# Each factor appears 10 times
# Adding a row, that makes my_factor == 23 appear 11 times
x <- data.frame(
my_factor = 23,
value = 100)
df <- rbind(df, c(23, 100))
现在我想看看我的my_factor
出现了多少次,并且能够删除出现次数与n
不同的所有行。
在数据示例中,我想删除所有因子变量等于23.
我尝试用rle
攻击它,但我似乎无法使用子集
y <- rle(as.character(df$my_factor))
y$lengths != 10
df[y$lengths != 10, ] # Wrong output
答案 0 :(得分:3)
您可以改为使用table
:
table(df$my_factor) != 10
#
# 1 2 3 4 5 6 7 8 9 10 11
# FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 12 13 14 15 16 17 18 19 20 21 22
# FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
# 23 24
# TRUE FALSE
names(which(table(df$my_factor) != 10))
# [1] "23"
df[!df$my_factor %in% names(which(table(df$my_factor) != 10)), ]
ave
也可以使用:
df[ave(1:nrow(df), df$my_factor, FUN = length) == 10, ]
或者,使用&#34; data.table&#34;:
library(data.table)
setDT(df)[, N := .N, by = my_factor][N == 10]
答案 1 :(得分:1)
你基本上拥有它,但需要对正确的rle进行排序:
y <- rle(as.character(sort(df$my_factor))) #sort!
然后
df[df$my_factor %in% y$values[y$lengths != 10], ]
或
df[df$my_factor %in% y$values[y$lengths == 10], ]
答案 2 :(得分:1)
tt = (table(df$my_factor)!=10)
df[df$my_factor == names(tt)[which(tt)],]
my_factor value
221 23 26.8138067
222 23 -2.8933233
223 23 34.9496288
224 23 39.8694566
225 23 -5.3975642
226 23 46.7891582
227 23 0.9553145
228 23 -5.8235961
229 23 64.8645187
230 23 22.3176873
241 23 100.0000000
>
# For other rows:
head(df[df$my_factor != names(tt)[which(tt)],])
my_factor value
1 1 26.863197
2 1 6.200210
3 1 7.474887
4 1 -3.015083
5 1 65.889539
6 1 111.526084
>