我感兴趣的是一种方法,可以自动将包含因子列(如df)的数据帧转换为最佳类型,类似于read.table创建的(如df2)。一种可能性是将数据帧写入字符串并使用read.table将其读回。还有其他的吗?
> df <- data.frame(a=c(" 1"," 2", " 3"),b=c("a","b","c"),c=c(" 1.0", "NA", " 2.0"),d=c(" 1", "B", "2"))
> str(df)
'data.frame': 3 obs. of 4 variables:
$ a: Factor w/ 3 levels " 1"," 2"," 3": 1 2 3
$ b: Factor w/ 3 levels "a","b","c": 1 2 3
$ c: Factor w/ 3 levels " 1.0"," 2.0",..: 1 3 2
$ d: Factor w/ 3 levels " 1","2","B": 1 3 2
> df2 <- with(df, data.frame(a=as.integer(a),b=b,c=as.numeric(c),d=as.character(d), stringsAsFactors=FALSE))
> str(df2)
'data.frame': 3 obs. of 4 variables:
$ a: int 1 2 3
$ b: Factor w/ 3 levels "a","b","c": 1 2 3
$ c: num 1 3 2
$ d: chr " 1" "B" "2"
答案 0 :(得分:3)
使用read.table
使用的功能:type.convert
。
示例:
df <- data.frame(a=c(" 1"," 2", " 3"), b=c("a","b","c"),
c=c(" 1.0", "NA", " 2.0"), d=c(" 1", "B", "2"))
str(df)
# 'data.frame': 3 obs. of 4 variables:
# $ a: Factor w/ 3 levels " 1"," 2"," 3": 1 2 3
# $ b: Factor w/ 3 levels "a","b","c": 1 2 3
# $ c: Factor w/ 3 levels " 1.0"," 2.0",..: 1 3 2
# $ d: Factor w/ 3 levels " 1","2","B": 1 3 2
df[] <- lapply(df, function(y) type.convert(as.character(y)))
df
# a b c d
# 1 1 a 1 1
# 2 2 b NA B
# 3 3 c 2 2
str(df)
# 'data.frame': 3 obs. of 4 variables:
# $ a: int 1 2 3
# $ b: Factor w/ 3 levels "a","b","c": 1 2 3
# $ c: num 1 NA 2
# $ d: Factor w/ 3 levels " 1","2","B": 1 3 2
(但我不确定这是不是你要找的......)
更新 :如果您想创建colClasses
类型的函数,也许您可以尝试这样的函数。与您的问题标题不同,这不是“自动”,但它允许您为每列指定列类,而不是将其留给type.convert
来决定。
toColClasses <- function(inDF, colClasses) {
if (length(colClasses) != length(inDF)) stop("Please specify colClasses for each column")
inDF[] <- lapply(seq_along(colClasses), function(y) {
if (colClasses[y] == "") inDF[y] <- inDF[[y]]
else {
FUN <- match.fun(colClasses[y])
inDF[y] <- suppressWarnings(FUN(as.character(inDF[[y]])))
}
})
inDF
}
您可以按如下方式使用它:
df <- data.frame(a = c(" 1"," 2", " 3"), b = c("a","b","c"),
c = c(" 1.0", "NA", " 2.0"), d = c(" 1", "B", "2"))
df2 <- toColClasses(df, c("as.integer", "", "as.numeric", "as.character"))
df2
# a b c d
# 1 1 a 1 1
# 2 2 b NA B
# 3 3 c 2 2
str(df2)
# 'data.frame': 3 obs. of 4 variables:
# $ a: int 1 2 3
# $ b: Factor w/ 3 levels "a","b","c": 1 2 3
# $ c: num 1 NA 2
# $ d: chr " 1" "B" "2"
你需要在函数上做更多的工作才能让它接受更广泛的as...
函数。