这是一个数据框:
vegetables <- c("carrots", "carrots", "carrots", "carrots", "carrots")
animals <- c("cats", "dogs", "dogs", "fish", "cats")
df <- data.frame(vegetables, animals)
看起来像:
> df
vegetables animals
1 carrots cats
2 carrots dogs
3 carrots dogs
4 carrots fish
5 carrots cats
如果我想删除级别频率低于的行,例如2(例如df中的鱼)然后删除该行:
for ( i in names(df) ) {
df <- subset(df, with(df, df[,i] %in% names(which(table(df[,i]) >= 2))))
}
> df
vegetables animals
1 carrots cats
2 carrots dogs
3 carrots dogs
5 carrots cats
但是,如果我不想删除观察结果,而是用#34; bla&#34;替换鱼,那该怎么办。
我该怎么做?
期望的输出:
> df
vegetables animals
1 carrots cats
2 carrots dogs
3 carrots dogs
4 carrots bla
5 carrots cats
答案 0 :(得分:4)
不确定变量的级别是否重要,如果不重要,您可以使用stringsAsFactors=FALSE
执行以下操作
作为data.frame
vegetables <- c("carrots", "carrots", "carrots", "carrots", "carrots")
animals <- c("cats", "dogs", "dogs", "fish", "cats")
DF <- data.frame(vegetables, animals,stringsAsFactors=FALSE)
threshold = 2
DF$animals[ DF$animals == names(which(table(DF$animals) < threshold)) ] = "foo"
DF
# vegetables animals
#1 carrots cats
#2 carrots dogs
#3 carrots dogs
#4 carrots foo
#5 carrots cats
答案 1 :(得分:3)
您可以使用表来更新级别,以索引要更改的级别:
levels(df$animals)[table(df$animals) < 2] <- 'bla'
df
## vegetables animals
## 1 carrots cats
## 2 carrots dogs
## 3 carrots dogs
## 4 carrots bla
## 5 carrots cats
答案 2 :(得分:2)
我们可以使用data.table
library(data.table)
setDT(df)[df[, .I[.N > 1], by = .(vegetables, animals)]$V1]
如果我们想用'bla'
替换每列中的低频项目threshold <- 1
df[] <- lapply(df, as.character)
setDT(df)
for(j in seq_along(df)){
df[, N := .N, c(names(df)[j])][N == threshold, names(df)[j] := "bla"][, N := NULL][]
}