Question

我想删除不会发生n次的行。因此，如果我的因子变量发生!= n次，我想删除所有因子变量。

示例数据：

df <- data.frame( 
  my_factor = factor(rep(1:24, each = 10)),
  value     = runif(240, min = -10, max = 125)
)
# Each factor appears 10 times

# Adding a row, that makes my_factor == 23 appear 11 times
x <- data.frame(
  my_factor = 23, 
  value = 100)

df <- rbind(df, c(23, 100))

现在我想看看我的my_factor出现了多少次，并且能够删除出现次数与n不同的所有行。

在数据示例中，我想删除所有因子变量等于23.

我尝试用rle攻击它，但我似乎无法使用子集

y <- rle(as.character(df$my_factor))
y$lengths != 10

df[y$lengths != 10, ] # Wrong output

Answer 1

您可以改为使用table：

table(df$my_factor) != 10
# 
#     1     2     3     4     5     6     7     8     9    10    11 
# FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
#    12    13    14    15    16    17    18    19    20    21    22 
# FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE 
#    23    24 
#  TRUE FALSE 
names(which(table(df$my_factor) != 10))
# [1] "23"
df[!df$my_factor %in% names(which(table(df$my_factor) != 10)), ]

ave也可以使用：

df[ave(1:nrow(df), df$my_factor, FUN = length) == 10, ]

或者，使用＆＃34; data.table＆＃34;：

library(data.table)
setDT(df)[, N := .N, by = my_factor][N == 10]

Answer 2

你基本上拥有它，但需要对正确的rle进行排序：

 y <- rle(as.character(sort(df$my_factor))) #sort!

然后

 df[df$my_factor %in% y$values[y$lengths != 10], ]

或

 df[df$my_factor %in% y$values[y$lengths == 10], ]

Answer 3

tt = (table(df$my_factor)!=10)

df[df$my_factor == names(tt)[which(tt)],]

    my_factor       value
221        23  26.8138067
222        23  -2.8933233
223        23  34.9496288
224        23  39.8694566
225        23  -5.3975642
226        23  46.7891582
227        23   0.9553145
228        23  -5.8235961
229        23  64.8645187
230        23  22.3176873
241        23 100.0000000
> 
# For other rows:
head(df[df$my_factor != names(tt)[which(tt)],])
  my_factor      value
1         1  26.863197
2         1   6.200210
3         1   7.474887
4         1  -3.015083
5         1  65.889539
6         1 111.526084
>

R - 删除具有！= n次出现的行

3 个答案: