我想将所有因子少于n的因子组合成一个名为" Else"
的因子例如,如果n = 3,那么在下面的df中,我想要结合" c"," d"和" e" as" Else":
df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))
我开始时得到一个包含所有低计数值的df:
library(plyr)
lowcounts = ddply(df, "y", function(z){if(nrow(z)<3) nrow(z) else NULL})
我知道我可以手动更改这些,但在实践中我有几十个级别,所以我需要自动化这个。
我想在级别(df)中选择并重命名%lowcount中的级别%,并将其余部分保持不变但不确定如何继续。
答案 0 :(得分:2)
为什么不是这样的?
library(data.table)
dt <- data.table(df)
dt[,ynew := ifelse(.N < 3, "else",as.character(y)), by = "y"]
答案 1 :(得分:2)
另一种选择:
#your dataframe
df = data.frame(x=c(1:10), y=c("a","a","a","b","b","b","c","d","d","e"))
#which levels to keep and which to change
res <- table(df$y)
notkeep <- names(res[res < 3])
keep <- names(res)[!names(res) %in% notkeep]
names(keep) <- keep
#set new levels
levels(df$y) <- c(keep, list("else" = notkeep))
df
# x y
#1 1 a
#2 2 a
#3 3 a
#4 4 b
#5 5 b
#6 6 b
#7 7 else
#8 8 else
#9 9 else
#10 10 else