为什么data.matrix正在改变数据框中的信息

时间:2015-07-06 14:33:44

标签: r data-cleaning

我正在尝试将以下数据框转换为矩阵。

> dput(data)
structure(list(`1` = structure(c(1L, 1L, 3L, 3L, 1L), .Label = c("1", 
"2", "3", "4", "5", "NA"), class = "factor"), `2` = structure(c(5L, 
5L, 2L, 2L, 5L), .Label = c("1", "2", "3", "4", "5", "6", "NA"
), class = "factor"), `3` = structure(c(34L, 46L, 51L, 28L, 13L
), .Label = c("0", "1", "10", "100", "105", "11", "110", "112", 
"12", "120", "14", "15", "16", "168", "18", "2", "20", "200", 
"21", "22", "24", "25", "26", "27", "28", "29", "3", "30", "31", 
"32", "35", "36", "4", "40", "41", "42", "42099", "42131", "42134", 
"42197", "42292", "45", "48", "49", "5", "50", "54", "55", "56", 
"6", "60", "64", "65", "7", "70", "72", "75", "77", "8", "80", 
"82", "84", "85", "9", "90", "NA"), class = "factor"), `4` = structure(c(1L, 
2L, 2L, 1L, 1L), .Label = c("0", "1", "NA"), class = "factor"), 
    `5` = structure(c(1L, 1L, 1L, 1L, 1L), .Label = c("0", "1", 
    "NA"), class = "factor")), .Names = c("1", "2", "3", "4", 
"5"), row.names = c(1L, 2L, 4L, 5L, 6L), class = "data.frame")

但是,当我使用data.matrix时,结果是不同的数据集。下面是我得到的新数据集。你有什么主意吗?我正在为OS X 10.10.4运行3.2.1 R版本。提前致谢。

> data_cleaned <- data.matrix(data)
> dput(data_cleaned)
structure(c(1L, 1L, 3L, 3L, 1L, 5L, 5L, 2L, 2L, 5L, 34L, 46L, 
    51L, 28L, 13L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Dim = c(5L, 
    5L), .Dimnames = list(c("1", "2", "4", "5", "6"), c("1", "2", 
    "3", "4", "5")))

2 个答案:

答案 0 :(得分:1)

您将部分数据存储为因子。当您在一个因子上调用as.numeric时,如果它恰好是数字,则得到因子的级别而不是实际值:

x = as.factor(c(5,4,3))
as.numeric(x)

但这有效:

as.numeric(as.character(x))

您可以尝试:

sapply(data, function(x) as.numeric(as.character(x)))

将它包装在整个data.frame

答案 1 :(得分:0)

这是另一种可能性:

size <-dim(data)
m <- matrix(as.numeric(as.matrix(data)),nrow=size[1],ncol=size[2])

#> m
#     [,1] [,2] [,3] [,4] [,5]
#[1,]    1    5   40    0    0
#[2,]    1    5   50    1    0
#[3,]    3    2   60    1    0
#[4,]    3    2   30    0    0
#[5,]    1    5   16    0    0
#> class(m)
#[1] "matrix"
#> str(m)
# num [1:5, 1:5] 1 1 3 3 1 5 5 2 2 5 ...

希望这有帮助。