Question

这是一个数据框：

vegetables <- c("carrots", "carrots", "carrots", "carrots", "carrots")
animals <- c("cats", "dogs", "dogs", "fish", "cats")
df <- data.frame(vegetables, animals)

看起来像：

> df
  vegetables animals
1    carrots    cats
2    carrots    dogs
3    carrots    dogs
4    carrots    fish
5    carrots    cats

如果我想删除级别频率低于的行，例如2（例如df中的鱼）然后删除该行：

for ( i in names(df) ) {
  df <- subset(df, with(df, df[,i] %in% names(which(table(df[,i]) >= 2))))
}

> df
  vegetables animals
1    carrots    cats
2    carrots    dogs
3    carrots    dogs
5    carrots    cats

但是，如果我不想删除观察结果，而是用＃34; bla＆＃34;替换鱼，那该怎么办。

我该怎么做？

期望的输出：

> df
  vegetables animals
1    carrots    cats
2    carrots    dogs
3    carrots    dogs
4    carrots    bla
5    carrots    cats

Answer 1

不确定变量的级别是否重要，如果不重要，您可以使用stringsAsFactors=FALSE执行以下操作作为data.frame

中的选项

vegetables <- c("carrots", "carrots", "carrots", "carrots", "carrots")
animals <- c("cats", "dogs", "dogs", "fish", "cats")
DF <- data.frame(vegetables, animals,stringsAsFactors=FALSE)

threshold = 2
DF$animals[ DF$animals == names(which(table(DF$animals) < threshold)) ] = "foo"

DF
#  vegetables animals
#1    carrots    cats
#2    carrots    dogs
#3    carrots    dogs
#4    carrots     foo
#5    carrots    cats

Answer 2

您可以使用表来更新级别，以索引要更改的级别：

levels(df$animals)[table(df$animals) < 2] <- 'bla'

df
##   vegetables animals
## 1    carrots    cats
## 2    carrots    dogs
## 3    carrots    dogs
## 4    carrots     bla
## 5    carrots    cats

Answer 3

我们可以使用data.table

library(data.table)
setDT(df)[df[,  .I[.N > 1], by = .(vegetables, animals)]$V1]

如果我们想用'bla'

替换每列中的低频项目

threshold <- 1
df[] <- lapply(df, as.character)
setDT(df)
for(j in seq_along(df)){
  df[, N := .N, c(names(df)[j])][N == threshold, names(df)[j] := "bla"][, N := NULL][]
  }

根据级别的频率替换因子中的值

3 个答案: