Question

我正在尝试删除数据框中所有仅包含值0的列。我的代码是在该网站上找到的以下代码。

dataset = dataset[ ,colSums(dataset != 0) > 0]

但是，我一直返回错误：

[。data.frame（dataset，，colSums（dataset！= 0）> 0）中的错误：
未定义的列已选择

Answer 1

这是因为您至少有一列不适用。像这样修复：

dataset = dataset[ , colSums(dataset != 0, na.rm = TRUE) > 0]

Answer 2

以下代码将检查哪些列为数字（或整数），并删除包含全零和NA的列：

# example data
df <- data.frame( 
        one = rep(0,100), 
        two = sample(letters, 100, T), 
        three = rep(0L,100), 
        four = 1:100,
        stringsAsFactors = F
      )

# create function that checks numeric columns for all zeros
only_zeros <- function(x) {
    if(class(x) %in% c("integer", "numeric")) {
        all(x == 0, na.rm = TRUE) 
    } else { 
        FALSE
    }
}

# apply that function to your data
df_without_zero_cols <- df[ , !sapply(df, only_zeros)]

Answer 3

还有使用all()的替代方法：

dataset[, !sapply(dataset, function(x) all(x == 0))]

  a  c  d f
1 1 -1 -1 a
2 2  0 NA a
3 3  1  1 a

如果数据集很大，则可以通过删除引用

来避免复制时间和内存。

library(data.table)
cols <- which(sapply(dataset, function(x) all(x == 0)))
setDT(dataset)[, (cols) := NULL]
dataset

   a  c  d f
1: 1 -1 -1 a
2: 2  0 NA a
3: 3  1  1 a

数据

dataset <- data.frame(a = 1:3, b = 0, c = -1:1, d = c(-1, NA, 1), e = 0, f ="a")
dataset

  a b  c  d e f
1 1 0 -1 -1 0 a
2 2 0  0 NA 0 a
3 3 0  1  1 0 a

删除全为0的列

3 个答案:

数据