Question

给定一个data.frame，我想测试所有列是否都是相同的“类”。如果他们是我想保留data.frame原样。如果它们不是，我想保留与第一个变量类匹配的所有列，并删除任何不属于该类的列。例外的是，就我的目的而言，整数和数字是相等的。

例如：

dat <- data.frame(numeric,numeric,integer,factor)

将是：

data.frame(numeric,numeric,integer)

此外

dat <- data.frame(character,character,integer)

将是：

data.frame(character,character)

最后：

dat <- data.frame(numeric,numeric,numeric,factor)

将是：

data.frame(numeric,numeric,numeric)

Answer 1

我会这样做：

dat <- data.frame(
  a=as.integer(1:26), b=as.integer(26:1), c=as.numeric(1:26), d=as.factor(1:26)
)

创建两个辅助函数：

is.numint <- function(x)is.numeric(x) || is.integer(x)
is.charfact <- function(x)is.character(x) || is.factor(x)

仅返回数字列：

head(dat[, sapply(dat, is.numint)])
    a  b  c
1   1 26  1
2   2 25  2
3   3 24  3
4   4 23  4
5   5 22  5

仅返回因子列：

head(dat[, sapply(dat, is.charfact), drop=FALSE])
  d
1 1
2 2
3 3
4 4
5 5
6 6

结合这种方法，并重写你的功能：

dropext <- function(x){
  is.numint <- function(x)is.numeric(x) || is.integer(x)
  is.charfact <- function(x)is.character(x) || is.factor(x)
  cl <- rep(NA, length(x))
  cl[sapply(x, is.numint)] <- "num"
  cl[sapply(x, is.charfact)] <- "char"
  x[, cl == unique(cl)[1], drop=FALSE]
}

dropext(dat)
    a  b  c
1   1 26  1
2   2 25  2
3   3 24  3
4   4 23  4
5   5 22  5

Answer 2

怎么样：

if(length(unique(cl <- sapply(dat, class))) > 1 && 
   any(!sapply(dat, is.numeric))) {
    dat <- dat[ , which(cl == cl[1]), drop = FALSE]
}

这假设在以下示例中：

dat2 <- data.frame(A = factor(sample(LETTERS, 26, replace = TRUE)),
                   B = factor(sample(LETTERS, 26, replace = TRUE)),
                   C = sample(LETTERS, 26, replace = TRUE),
                   dat, stringsAsFactors = FALSE)


> sapply(dat2, class)
               A                B                C 
        "factor"         "factor"      "character" 
as.integer.1.26. as.integer.26.1. as.numeric.1.26. 
       "integer"        "integer"        "numeric"

你想要仅因子变量，即你想要区分字符和因子变量 - 这就是你的代码似乎要做的事情。

对于这个例子，我使用了

if(length(unique(cl <- sapply(dat2, class))) > 1 &&
   any(!sapply(dat2, is.numeric))) {
    dat2 <- dat2[ ,which(cl == cl[1]), drop = FALSE]
}

导致

> head(dat2)
  A B
1 D G
2 P D
3 C T
4 X F
5 N R
6 A E
> sapply(dat2, class)
       A        B 
"factor" "factor"

在dat上，上述if()声明不会更改dat：

>     if(length(unique(cl <- sapply(dat, class))) > 1 && 
+         any(!sapply(dat, is.numeric))) {
+         dat <- dat[ , which(cl == cl[1]), drop = FALSE]
+     }
> head(dat)
  as.integer.1.26. as.integer.26.1. as.numeric.1.26.
1                1               26                1
2                2               25                2
3                3               24                3
4                4               23                4
5                5               22                5
6                6               21                6

Answer 3

欣赏评论和你的答案，最后我需要的是一个不区分整数和数字的class（）函数。这可以通过一个简单的包装器完成。

class.wrap <- function(x) {
test <- class(x) 
if(test == "integer") test <- "numeric"
return(test)
}

class（）的替代品，不区分“数字”和“整数”

3 个答案: