如何将带小数点的因子转换为数值

时间:2013-12-21 04:13:01

标签: r numeric r-factor

我有一个包含因子

的向量的数据集
> str(gdp)
'data.frame':   64 obs. of  31 variables:
 $ 1 : Factor w/ 62 levels "","1,145.31",..: 1 1 1 53 16 20 22 24 30 32 ...
 $ 2 : Factor w/ 64 levels "1,121.93","1,264.63",..: 42 59 10 13 18 16 17 23 25 35 ...
 $ 3 : Factor w/ 62 levels "","1,072.07",..: 1 1 1 35 36 39 41 42 45 51 ...
 $ 4 : Factor w/ 62 levels "","1,076.03",..: 1 1 1 15 16 21 23 26 27 36 ...
 $ 5 : Factor w/ 62 levels "","1,023.09",..: 1 1 1 11 15 19 17 23 21 27 ...
 $ 6 : Factor w/ 62 levels "","1,003.81",..: 1 1 1 40 45 46 47 52 56 7 ...
 $ 7 : Factor w/ 62 levels "","1,137.23",..: 1 1 1 13 15 19 21 23 24 28 ...
 $ 8 : Factor w/ 62 levels "","1,198.30",..: 1 1 1 26 31 34 35 39 40 47 ...
 $ 9 : Factor w/ 64 levels "1,114.32","1,519.23",..: 27 30 36 41 49 51 50 54 56 64 ...
 $ 10: Factor w/ 62 levels "","1,208.85",..: 1 1 1 35 39 40 42 45 46 53 ...
 $ 11: Factor w/ 64 levels "","1,089.33",..: 1 11 17 20 23 24 26 29 31 37 ...
 $ 12: Factor w/ 62 levels "","1,037.14",..: 1 1 1 22 23 25 31 30 36 41 ...
 $ 13: Factor w/ 63 levels "","1,114.20",..: 1 63 1 8 11 12 14 20 22 27 ...
 $ 14: Factor w/ 64 levels "1,169.73","1,409.74",..: 63 12 14 16 17 22 24 25 28 30 ...
 $ 15: Factor w/ 62 levels "","1,117.66",..: 1 1 1 33 35 39 40 44 43 53 ...
 $ 16: Factor w/ 63 levels "","1,045.73",..: 21 1 1 30 35 38 41 42 47 50 ...
 $ 17: Factor w/ 62 levels "","1,088.39",..: 1 1 1 24 32 26 34 38 40 48 ...
 $ 18: Factor w/ 62 levels "","1,244.71",..: 1 1 1 24 30 31 33 34 38 44 ...
 $ 19: Factor w/ 62 levels "","1,155.37",..: 1 1 1 25 34 37 38 41 44 48 ...
 $ 20: Factor w/ 64 levels "","1,198.29",..: 1 63 8 11 15 17 18 20 26 30 ...
 $ 21: Factor w/ 36 levels "","1,065.67",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ 22: Factor w/ 64 levels "1,123.06","1,315.12",..: 12 14 15 17 22 23 24 26 27 40 ...
 $ 23: Factor w/ 62 levels "","1,016.31",..: 1 1 1 22 25 31 33 38 43 49 ...
 $ 24: Factor w/ 64 levels "1,029.92","1,133.27",..: 52 53 57 60 6 8 9 12 13 22 ...
 $ 25: Factor w/ 64 levels "1,222.15","1,517.69",..: 60 62 7 8 12 14 15 21 22 25 ...
 $ 26: num  NA NA 1.29 1.32 1.36 1.39 1.43 1.62 1.56 1.72 ...
 $ 27: Factor w/ 62 levels "","1,036.85",..: 1 1 1 12 16 21 22 27 25 33 ...
 $ 28: Factor w/ 61 levels "","1,052.88",..: 1 1 1 12 13 17 18 24 23 26 ...
 $ 29: Factor w/ 64 levels "1,018.62","1,081.27",..: 6 7 8 9 10 26 27 34 35 43 ...
 $ 30: Factor w/ 62 levels "","1,203.92",..: 1 1 1 6 5 21 22 23 24 32 ...
 $ 31: Factor w/ 62 levels "","1,039.85",..: 1 1 1 57 59 9 11 13 14 16 ...

我正在尝试保留所有信息(小数点)并将所有向量转换为数字。到目前为止,我已经尝试将这些向量转换为字符,然后转换为数字,这在SO中提出但是我得到了

> gdp<-data.frame(lapply(gdp,as.character))
> gdp<-data.frame(lapply(gdp,as.numeric))
> str(gdp)
'data.frame':   64 obs. of  31 variables:
 $ X1 : num  1 1 1 53 16 20 22 24 30 32 ...
 $ X2 : num  42 59 10 13 18 16 17 23 25 35 ...
 $ X3 : num  1 1 1 35 36 39 41 42 45 51 ...
 $ X4 : num  1 1 1 15 16 21 23 26 27 36 ...
 $ X5 : num  1 1 1 11 15 19 17 23 21 27 ...
 $ X6 : num  1 1 1 40 45 46 47 52 56 7 ...
 $ X7 : num  1 1 1 13 15 19 21 23 24 28 ...
 $ X8 : num  1 1 1 26 31 34 35 39 40 47 ...
 $ X9 : num  27 30 36 41 49 51 50 54 56 64 ...
 $ X10: num  1 1 1 35 39 40 42 45 46 53 ...
 $ X11: num  1 11 17 20 23 24 26 29 31 37 ...
 $ X12: num  1 1 1 22 23 25 31 30 36 41 ...
 $ X13: num  1 63 1 8 11 12 14 20 22 27 ...
 $ X14: num  63 12 14 16 17 22 24 25 28 30 ...
 $ X15: num  1 1 1 33 35 39 40 44 43 53 ...
 $ X16: num  21 1 1 30 35 38 41 42 47 50 ...
 $ X17: num  1 1 1 24 32 26 34 38 40 48 ...
 $ X18: num  1 1 1 24 30 31 33 34 38 44 ...
 $ X19: num  1 1 1 25 34 37 38 41 44 48 ...
 $ X20: num  1 63 8 11 15 17 18 20 26 30 ...
 $ X21: num  1 1 1 1 1 1 1 1 1 1 ...
 $ X22: num  12 14 15 17 22 23 24 26 27 40 ...
 $ X23: num  1 1 1 22 25 31 33 38 43 49 ...
 $ X24: num  52 53 57 60 6 8 9 12 13 22 ...
 $ X25: num  60 62 7 8 12 14 15 21 22 25 ...
 $ X26: num  NA NA 1 2 3 4 5 7 6 8 ...
 $ X27: num  1 1 1 12 16 21 22 27 25 33 ...
 $ X28: num  1 1 1 12 13 17 18 24 23 26 ...
 $ X29: num  6 7 8 9 10 26 27 34 35 43 ...
 $ X30: num  1 1 1 6 5 21 22 23 24 32 ...
 $ X31: num  1 1 1 57 59 9 11 13 14 16 ...

不保留所有小数点,也不填写空白作为NA。我也试过了

> gdp<-as.numeric(levels(gdp))[gdp]
Error in as.numeric(levels(gdp))[gdp] : invalid subscript type 'list'

是否有办法将矢量转换为数字?

1 个答案:

答案 0 :(得分:0)

让我们打破这个。

首先,因为gdp是一个数据框,levels将返回NULL。您可能正在levels的每列上查找gdp的输出。在这种情况下,您需要使用lapply

之类的内容
levels(gdp)
# NULL
lapply(gdp, levels)
# this output will make sense
as.numeric(levels(gdp))[gdp]
# this will make no sense

错误表明您无法使用列表(gdp)来下标向量。

要遍历gdp的列,您需要使用类似lapply的内容来处理每个组件。

gdp <- data.frame(lapply(gdp, function(x) {
    if(!is.factor(x)) x 
    else as.numeric(gsub(",","",levels(x),fixed=TRUE))[x] 
}))

可能您的数据集最好用作矩阵,因为它似乎都是数字类型。在这种情况下:

gdp <- as.matrix(gdp)