如何在N%的行中选择值大于k的列?

时间:2016-06-04 17:01:07

标签: r

我有大数据框,我想过滤它的列。 基本上我想在行的 N%中保留条目大于 k 的列。 有人可以帮我在R中这样做吗?我是R的新人。

1 个答案:

答案 0 :(得分:3)

有一个可重复的例子很好。

我将使用数据diamonds作为插图

data(diamonds)


keepCol <- function(df, K, N){
  # df: data.fram
  # K: Threshold value
  # N: % criteria

 # how many rows are in the data.frame
 cntRows <- dim(df)[1]
 # how many should fullfill the criteria (N%)
 N <- N*cntRows

 # Get the class of each column
 colClass <- lapply(df, class) %>% unlist

 # keep those that are numeric
 colNames <- names(colClass[colClass=="numeric"])
 df <- df[, colNames]

 # How many case of each numeric column fullfill your criteria (are > then K)
 keepCol <- (apply(df, 2, function(x) sum(x>K))>N)

 # Keep only those columns
 df <- df[, names(keepCol[keepCol==T])]

 return(df)

}

keepCol(diamonds, K=4, N=0.2)