Question

我有一个包含所有因子值的数据框

V1 V2 V3
 a  b  c
 c  b  a
 c  b  c
 b  b  a

如何将数据框中的所有值转换为具有数值的新值（a到1，b到2，c到3等等）

Answer 1

我会尝试：

> mydf[] <- as.numeric(factor(as.matrix(mydf)))
> mydf
  V1 V2 V3
1  1  2  3
2  3  2  1
3  3  2  3
4  2  2  1

Answer 2

从factor转换为numeric会给出整数值。但是，如果factor列的级别指定为c('b', 'a', 'c', 'd')或c('c', 'b', 'a')，则整数值将按此顺序排列。为避免这种情况，我们可以再次调用levels来指定factor

df1[] <- lapply(df1, function(x) 
                as.numeric(factor(x, levels=letters[1:3])))

如果我们使用data.table，则可以使用set。对于大型数据集，它会更有效。转换为matrix可能会造成内存问题。

library(data.table)
setDT(df1)
for(j in seq_along(df1)){
 set(df1, i=NULL, j=j, 
     value= as.numeric(factor(df1[[j]], levels= letters[1:3])))
 }

Answer 3

此方法与Ananda相似，但使用unlist()代替factor(as.matrix())。由于您的所有列都已经是因素，unlist()会将它们组合成一个具有适当级别的因子向量。

因此，让我们来看看当unlist()数据框时会发生什么。

unlist(df, use.names = FALSE)
#  [1] a c c b b b b b c a c a
# Levels: a b c

现在我们可以在上面的代码上运行as.integer()（或c()），因为因子的整数值与您想要的映射相匹配。因此，以下内容将重新评估您的整个数据框架。

df[] <- as.integer(unlist(df, use.names = FALSE))
## note that you can also just drop the factor class with c()
## df[] <- c(unlist(df, use.names = FALSE))
df
#   V1 V2 V3
# 1  1  2  3
# 2  3  2  1
# 3  3  2  3
# 4  2  2  1

注意： use.names = FALSE不是必需的。但是，删除names属性将使此过程更有效。

数据：

df <- structure(list(V1 = structure(c(1L, 3L, 3L, 2L), .Label = c("a", "b", "c"), class = "factor"), V2 = structure(c(1L, 1L, 1L, 1L ), .Label = "b", class = "factor"), V3 = structure(c(2L, 1L, 2L, 1L), .Label = c("a", "c"), class = "factor")), .Names = c("V1", "V2", "V3"), class = "data.frame", row.names = c(NA, -4L))

如何将因子的数据帧转换为数字？

3 个答案: