Question

我有一个宽data.frame，它是所有字符向量（df1）。我有一个单独的vector（vec1），其中包含我要分配给df1中每个列的列类。

如果我使用的是read.csv()，我会使用colClasses参数并将其设置为vec1，但现有的{似乎没有类似的选项{1}}。

除了循环之外，有什么建议可以快速完成此操作吗？

Answer 1

我不知道它是否会有所帮助，但我已多次遇到同样的需求而且我已经创建了一个函数，以防它有所帮助：

reclass <- function(df, vec){
  df[] <- Map(function(x, f){
    #switch below shows the accepted values in the vector
    #you can modify it and/or add more
    f <- switch(f,
                as.is  = 'force',
                factor = 'as.factor',
                num    = 'as.numeric',
                char   = 'as.character')
    #takes the name of the function and fetches the function
    f <- get(f)
    #apply the function
    f(x)
  },
      df,
      vec)
df
}

它使用Map将类向量传递给data.frame。每个元素对应于列的类。数据帧和向量的长度必须相同。

我也在使用switch来缩短相应的类来输入。使用as.is保持班级相同，其余的都是我自己解释的。

小例子：

df1 <- data.frame(1:10, letters[1:10], runif(50))
> str(df1)
'data.frame':   50 obs. of  3 variables:
 $ X1.10        : int  1 2 3 4 5 6 7 8 9 10 ...
 $ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ runif.50.    : num  0.0969 0.1957 0.8283 0.1768 0.9821 ...

在功能之后：

df1 <- reclass(df1, c('num','as.is','char'))
> str(df1)
'data.frame':   50 obs. of  3 variables:
 $ X1.10        : num [1:50] 1 2 3 4 5 6 7 8 9 10 ...
 $ letters.1.10.: Factor w/ 10 levels "a","b","c","d",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ runif.50.    : chr [1:50] "0.0968757788650692" "0.19566105119884" "0.828283685725182" "0.176784737734124" ...

我猜Map内部是一个循环，但它是用C编写的，所以它应该足够快。

Answer 2

可能你可以尝试这个能做同样工作的功能。

reclass <- function (df, vec_types) {
        for (i in 1:ncol(df)) {
          type <- vec_types[i]
          class(df[ , i]) <- type
          }
        return(df)
        }

这是vec_types（类型向量）的一个例子：

vec_types <- c('character', rep('integer', 3), rep('character', 2))

你可以测试这个表（df）的函数（重新分类）：

table <- data.frame(matrix(sample(1:10,30, replace = T), nrow = 5, ncol = 6))
str(table)  # original column types

# apply the function
table <- reclass(table, vec_types)
str(table)  # new column types

R将变量类型分配给vector中的大data.frame

2 个答案: