我在使用data.table时遇到问题:如何转换列类?这是一个简单的例子:使用data.frame我没有转换它的问题,data.table我只是不知道如何:
df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])
library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE)
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE])
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") :
#unused argument(s) (with = FALSE)
我在这里想念一些明显的东西吗?
由于Matthew的帖子更新:之前我使用的是旧版本,但即使更新到1.6.6(我现在使用的版本)之后,我仍然会收到错误。
更新2:假设我想将类“factor”的每一列转换为“character”列,但事先不知道哪个列属于哪个类。使用data.frame,我可以执行以下操作:
classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)
我可以使用data.table做类似的事情吗?
更新3:
sessionInfo() R版本2.13.1(2011-07-08) 平台:x86_64-pc-mingw32 / x64(64位)
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.6.6
loaded via a namespace (and not attached):
[1] tools_2.13.1
答案 0 :(得分:91)
对于单个栏目:
dtnew <- dt[, Quarter:=as.character(Quarter)]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : num -0.838 0.146 -1.059 -1.197 0.282 ...
使用lapply
和as.character
:
dtnew <- dt[, lapply(.SD, as.character), by=ID]
str(dtnew)
Classes ‘data.table’ and 'data.frame': 10 obs. of 3 variables:
$ ID : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
$ Quarter: chr "1" "2" "3" "4" ...
$ value : chr "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...
答案 1 :(得分:39)
试试这个
DT <- data.table(X1 = c("a", "b"), X2 = c(1,2), X3 = c("hello", "you"))
changeCols <- colnames(DT)[which(as.vector(DT[,lapply(.SD, class)]) == "character")]
DT[,(changeCols):= lapply(.SD, as.factor), .SDcols = changeCols]
答案 2 :(得分:2)
这是一个很糟糕的方法!我只留下这个答案,以防它解决其他奇怪的问题。这些更好的方法可能部分是由于更新的data.table版本的结果......因此值得记录这一难题。另外,这是eval
substitute
语法的一个很好的语法示例。
library(data.table)
dt <- data.table(ID = c(rep("A", 5), rep("B",5)),
fac1 = c(1:5, 1:5),
fac2 = c(1:5, 1:5) * 2,
val1 = rnorm(10),
val2 = rnorm(10))
names_factors = c('fac1', 'fac2')
names_values = c('val1', 'val2')
for (col in names_factors){
e = substitute(X := as.factor(X), list(X = as.symbol(col)))
dt[ , eval(e)]
}
for (col in names_values){
e = substitute(X := as.numeric(X), list(X = as.symbol(col)))
dt[ , eval(e)]
}
str(dt)
给你
Classes ‘data.table’ and 'data.frame': 10 obs. of 5 variables:
$ ID : chr "A" "A" "A" "A" ...
$ fac1: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
$ fac2: Factor w/ 5 levels "2","4","6","8",..: 1 2 3 4 5 1 2 3 4 5
$ val1: num 0.0459 2.0113 0.5186 -0.8348 -0.2185 ...
$ val2: num -0.0688 0.6544 0.267 -0.1322 -0.4893 ...
- attr(*, ".internal.selfref")=<externalptr>
答案 3 :(得分:1)
提出马特·道尔(Matt Dowle)对Geneorama的回答(https://stackoverflow.com/a/20808945/4241780)的评论,以使其更加明显(鼓励)。
for (col in names_factors)
set(dt, j=col, value=as.factor(dt[[col]]))
此外,在马特的另一则评论中指出,请参见https://stackoverflow.com/a/33000778/4241780以获取更多信息。
答案 4 :(得分:0)
我尝试了几种方法。
# BY {dplyr}
data.table(ID = c(rep("A", 5), rep("B",5)),
Quarter = c(1:5, 1:5),
value = rnorm(10)) -> df1
df1 %<>% dplyr::mutate(ID = as.factor(ID),
Quarter = as.character(Quarter))
# check classes
dplyr::glimpse(df1)
# Observations: 10
# Variables: 3
# $ ID (fctr) A, A, A, A, A, B, B, B, B, B
# $ Quarter (chr) "1", "2", "3", "4", "5", "1", "2", "3", "4", "5"
# $ value (dbl) -0.07676732, 0.25376110, 2.47192852, 0.84929175, -0.13567312, -0.94224435, 0.80213218, -0.89652819...
,或其他
# from list to data.table using data.table::setDT
list(ID = as.factor(c(rep("A", 5), rep("B",5))),
Quarter = as.character(c(1:5, 1:5)),
value = rnorm(10)) %>% setDT(list.df) -> df2
class(df2)
# [1] "data.table" "data.frame"
答案 5 :(得分:0)
我提供了一种更通用,更安全的方式来做这些事情,
".." <- function (x)
{
stopifnot(inherits(x, "character"))
stopifnot(length(x) == 1)
get(x, parent.frame(4))
}
set_colclass <- function(x, class){
stopifnot(all(class %in% c("integer", "numeric", "double","factor","character")))
for(i in intersect(names(class), names(x))){
f <- get(paste0("as.", class[i]))
x[, (..("i")):=..("f")(get(..("i")))]
}
invisible(x)
}
函数..
确保我们从data.table的范围中得到一个变量; set_colclass将设置cols的类。
您可以像这样使用它:
dt <- data.table(i=1:3,f=3:1)
set_colclass(dt, c(i="character"))
class(dt$i)
答案 6 :(得分:0)
这里与@Nera建议先检查类的方式相同,但不是使用.SD
,而是使用data.table的快速循环和set
作为添加了类的@Matt Dowle解决方案检查。
for (j in seq_len(ncol(DT))){
if(class(DT[[j]]) == 'factor')
set(DT, j = j, value = as.character(DT[[j]]))
}
答案 7 :(得分:-1)
如果您在data.table中有一个列名列表,则要更改do的类:
convert_to_character <- c("Quarter", "value")
dt[, convert_to_character] <- dt[, lapply(.SD, as.character), .SDcols = convert_to_character]
答案 8 :(得分:-3)
尝试:
dt <- data.table(A = c(1:5),
B= c(11:15))
x <- ncol(dt)
for(i in 1:x)
{
dt[[i]] <- as.character(dt[[i]])
}