从data.frame转换为数字矩阵时,为什么值会发生变化?

时间:2015-03-04 21:22:34

标签: r matrix dataframe

我需要将数据框转换为数字矩阵。但是,当我使用data.frame函数时,小数转换为不同的数字,我不知道为什么。有人可以告诉我发生了什么事吗?

> head(x[,1:5])
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2              -0.840833333333333                        0.018            0.354916666666667                    -0.203750                    0.6890000
CREB3L1                         1.333                       0.7625                      0.13475                     2.498750                    1.1572500
RPS11                          1.4755                       0.3245                        0.634                     0.483125                    0.9526250
PNMA1                        -1.39075                     -1.48725                      -0.8305                    -0.463250                   -2.2230000
MMP2               0.0278333333333333                      -0.2065           0.0666666666666666                     2.156000                    0.1501667
C10orf90                      -2.5495                     -2.76575                     -2.76375                    -2.482250                   -2.1107500
> head(data.matrix(x[,1:5]))
         TCGA-AA-3520-01A-01R-0821-07 TCGA-AA-3532-01A-01R-0821-07 TCGA-AA-3553-01A-01R-0821-07 TCGA-A6-2674-01A-02R-0821-07 TCGA-AA-3521-01A-01R-0821-07
ELMO2                            3323                           94                         1701                    -0.203750                    0.6890000
CREB3L1                          4307                         3022                          654                     2.498750                    1.1572500
RPS11                            4485                         1458                         2786                     0.483125                    0.9526250
PNMA1                            4379                         4438                         3397                    -0.463250                   -2.2230000
MMP2                              155                          932                          328                     2.156000                    0.1501667
C10orf90                         5139                         5193                         5230                    -2.482250                   -2.1107500
> class(x)
[1] "data.frame"

> str(x)
'data.frame':   6150 obs. of  174 variables:
 $ TCGA-AA-3520-01A-01R-0821-07: Factor w/ 5538 levels "","0","0.000166666666666662",..: 3323 4307 4485 4379 155 5139 4177 1400 4735 3363 ...
 $ TCGA-AA-3532-01A-01R-0821-07: Factor w/ 5597 levels "","0.000499999999999968",..: 94 3022 1458 4438 932 5193 1374 2757 4671 2503 ...
 $ TCGA-AA-3553-01A-01R-0821-07: Factor w/ 5550 levels "","0.000249999999999995",..: 1701 654 2786 3397 328 5230 65 194 4900 3966 ...
 $ TCGA-A6-2674-01A-02R-0821-07: num  -0.204 2.499 0.483 -0.463 2.156 ...
 $ TCGA-AA-3521-01A-01R-0821-07: num  0.689 1.157 0.953 -2.223 0.15 ...
 $ TCGA-AA-3534-01A-01R-0821-07: num  -0.6789 -0.0877 1.5736 -1.6678 -0.7148 ...
 $ TCGA-AA-3555-01A-01R-0821-07: Factor w/ 5580 levels "","-0.00012499999999999",..: 373 4970 2076 519 1344 5084 3882 1285 4760 2778 ...
 $ TCGA-A6-2670-01A-02R-0821-07: num  0.588 0.569 0.808 -1.661 1.073 ...
 $ TCGA-A6-2683-01A-01R-0821-07: num  -0.77 0.741 1.564 -2.984 -1.569 ...
 $ TCGA-AA-3526-01A-02R-0821-07: num  -0.824 2.215 0.819 -1.846 -0.862 ...
 $ TCGA-A6-2677-01A-01R-0821-07: num  -0.733 0.526 0.892 -1.598 -1.69 ...
 $ TCGA-AA-3522-01A-01R-0821-07: num  -0.981 2.094 0.818 -1.048 -1.452 ...
 $ TCGA-AA-3538-01A-01R-0821-07: num  -0.144 0.631 0.794 -1.523 -0.198 ...
 $ TCGA-AA-3556-01A-01R-0821-07: Factor w/ 5556 levels "","-0.000125000000000014",..: 2256 4772 3446 4253 4040 4927 3026 316 3766 3221 ...
 $ TCGA-A6-2678-01A-01R-0821-07: num  -1.38 1.706 1.103 -2.725 -0.918 ...
 $ TCGA-AA-3524-01A-02R-0821-07: Factor w/ 5611 levels "","-0.0005","0.000500000000000006",..: 4062 3671 4749 4751 4051 5226 2623 1227 4252 1489 ...
 $ TCGA-AA-3542-01A-02R-0821-07: num  -1.195 0.641 1.952 -1.63 -1.264 ...
 $ TCGA-AA-3558-01A-01R-0821-07: Factor w/ 5580 levels "","0.000375000000000007",..: 4245 3920 4277 4910 4766 5126 1450 3350 4898 1915 ...
 $ TCGA-AA-3544-01A-01R-0821-07: num  -0.157 0.649 0.937 -1.941 -1.417 ...
 $ TCGA-AA-3560-01A-01R-0821-07: num  -0.146 0.554 0.581 -2.503 -0.438 ...
 $ TCGA-AA-3514-01A-02R-0821-07: Factor w/ 5678 levels "","0","0.000375000000000028",..: 3800 2056 2422 1158 1507 4620 3564 1877 5480 4076 ...
 $ TCGA-AA-3527-01A-01R-0821-07: num  -0.3973 -0.0915 1.4019 -2.5513 -0.395 ...
 $ TCGA-AA-3548-01A-01R-0821-07: Factor w/ 5470 levels "","0.000100000000000011",..: 2590 3817 3388 4531 2770 4922 2715 406 4473 2711 ...
 $ TCGA-AA-3561-01A-01R-0821-07: num  -1.115 1.01 1.266 -1.419 -0.537 ...
 $ TCGA-AA-3517-01A-01R-0821-07: Factor w/ 5604 levels "","-0.000333333333333335",..: 479 1182 4514 5003 4005 4799 1499 4796 849 3079 ...
 $ TCGA-AA-3529-01A-02R-0821-07: Factor w/ 5583 levels "","-0.000124999999999978",..: 2912 3970 4073 4555 4257 5238 3242 2668 899 3508 ...
 $ TCGA-AA-3549-01A-02R-0821-07: Factor w/ 5538 levels "","0.000166666666666671",..: 1378 4762 4356 4857 519 4739 1254 4777 350 444 ...
 $ TCGA-AA-3562-01A-02R-0821-07: Factor w/ 5628 levels "","0","0.000249999999999993",..: 2453 3556 3523 4987 2236 5148 1681 1854 2249 4096 ...

1 个答案:

答案 0 :(得分:1)

data.matrix()函数使用内部代码将因子转换为数字。这就是他们在使用data.matrix()后在数据框中列为因素并具有不同值的原因。要在这种情况下创建数字矩阵,请尝试以下方法:

y <- apply(as.matrix(x[, 1:5]), 2, as.numeric)

使用as.matrix()时,因子成为字符串。使用apply()会将所有内容转换为数字而不会丢失矩阵结构。

正如斯蒂芬亨德森在评论中所提到的,尝试找出为什么存储在数据框中的数值被视为因素是一个好主意。