我有大数据框,我想过滤它的列。 基本上我想在行的 N%中保留条目大于 k 的列。 有人可以帮我在R中这样做吗?我是R的新人。
答案 0 :(得分:3)
有一个可重复的例子很好。
我将使用数据diamonds
作为插图
data(diamonds)
keepCol <- function(df, K, N){
# df: data.fram
# K: Threshold value
# N: % criteria
# how many rows are in the data.frame
cntRows <- dim(df)[1]
# how many should fullfill the criteria (N%)
N <- N*cntRows
# Get the class of each column
colClass <- lapply(df, class) %>% unlist
# keep those that are numeric
colNames <- names(colClass[colClass=="numeric"])
df <- df[, colNames]
# How many case of each numeric column fullfill your criteria (are > then K)
keepCol <- (apply(df, 2, function(x) sum(x>K))>N)
# Keep only those columns
df <- df[, names(keepCol[keepCol==T])]
return(df)
}
keepCol(diamonds, K=4, N=0.2)