将因子/?名义变量转换为R中的数字

时间:2014-12-28 19:59:47

标签: r variables type-conversion numeric

我的问题似乎与this thread有关。

但是,那里给出的方法对我不起作用。

我将数据集中的矢量定义为:eduyears1994 <- year1994$q131ed 并收到一个看起来像这样的矢量:

[1] 17 lat/9   1O lat/3,4 1O lat/3,4 17 lat/9   17 lat/9   12 lat/5,6
                                        1O lat/3,4 1O lat/3,4 12 lat/5,6
   9 Levels: Brak formal wykszta³cenia 4 lata/1 8 lat/2 1O lat/3,4 12 lat/5,6 
     14 lat/7,8 ... BRAK DANYCH

例如“10 lat”代表10年(教育),“/ 3,4”代表因子标签。

我只想拥有一个数字变量,例如专栏中的“10”而不是“10年”。

我尝试过以下操作并收到以下错误消息:

  

eduyears1994n&lt; - as.numeric(as.character(eduyears1994))
     警告信息:
      强制引入的NA

我也尝试手动完成:

eduyears1994[eduyears1994== "4 lata/1"] <- 4
eduyears1994[eduyears1994== "2"] <- 8
eduyears1994[eduyears1994== "17 lat"] <- 17

但错误信息显示为:

  

在[&.-。因子( tmp ,eduyears1994 ==“9”,值= 17)中:
  无效因子水平,NA生成

当我用SPSS打开文件时,我看到的是数字,而不是标签,但是数据格式被指定为标称,这可能是问题的原因。

dput(eduyears1994)
c("17 lat/9", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "17 lat/9", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "14 lat/7,8", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "8 lat/2", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"4 lata/1", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"17 lat/9", "17 lat/9", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"4 lata/1", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "14 lat/7,8", "12 lat/5,6", 
"8 lat/2", "8 lat/2", "14 lat/7,8", "8 lat/2", "14 lat/7,8", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"4 lata/1", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", 
"17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "12 lat/5,6", "14 lat/7,8", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "8 lat/2", "14 lat/7,8", "1O lat/3,4", 
"8 lat/2", "17 lat/9", "17 lat/9", "17 lat/9", "12 lat/5,6", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "17 lat/9", "14 lat/7,8", "17 lat/9", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"8 lat/2", "14 lat/7,8", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"17 lat/9", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"14 lat/7,8", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "12 lat/5,6", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", "12 lat/5,6", 
"8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", 
"17 lat/9", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"8 lat/2", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", "12 lat/5,6", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", 
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2", 
"12 lat/5,6", "8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", 
"1O lat/3,4", "17 lat/9", "1O lat/3,4", "17 lat/9", "12 lat/5,6", 
"14 lat/7,8", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"8 lat/2", "8 lat/2", "8 lat/2", "4 lata/1", "12 lat/5,6", "17 lat/9", 
"12 lat/5,6", "17 lat/9", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"12 lat/5,6", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "8 lat/2", 
"8 lat/2", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "14 lat/7,8", 
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"12 lat/5,6", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4", 
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", 
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "17 lat/9", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6", 
"14 lat/7,8", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", 
"1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", "1O lat/3,4", 
"8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", 
"12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4", 
"17 lat/9", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", 
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", 
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", "1O lat/3,4", 
"12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", 
"14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", "1O lat/3,4"
)

3 个答案:

答案 0 :(得分:2)

使用您的实际数据,您可能会看到一般格式的字符向量

n lat/a,b

其中n是年份,&#34; a,b&#34;是某种标签。这将提取岁月。

vec <- c("17 lat/9","10 lat/3,4","10 lat/3,4","17 lat/9","17 lat/9","12 lat/5,6","10 lat/3,4","10 lat/3,4","12 lat/5,6")
x <- strsplit(vec,split=" lat/",fixed=TRUE)
sapply(x,function(x)as.integer(x[1]))
# [1] 17 10 10 17 17 12 10 10 12

答案 1 :(得分:1)

你可以尝试

c(17,8,4)[as.numeric(eduyears1994)]
#[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

 unname(c('4 lata/1'=4, '2'=8, '17 lat' =17)[as.character(eduyears1994)])
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  8  4  4  8  4  8  8

如果8实际上是typo,则可以使用

 library(stringi)
 as.numeric(unlist(stri_extract_all_regex(eduyears1994, '^\\d+')))
 #[1] 17  4 17  4 17 17  4  4 17 17 17 17  4  2  4  4  2  4  2  2

数据

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))

答案 2 :(得分:1)

使用@ akrun的例子:

set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))

使用gsub和(显然)正确的正则表达式(*表示“0或更多前面的字符或模式”,所以例如"lata*"匹配“lat”或“lata” “)

as.numeric(gsub(" lata*[/0-9,]*","",eduyears1994))

警告:这会将“2”转换为2而不是8,这不是您要求的。我不太确定将“4 lata / 1”转换为4,“17 lat”转换为17,将“2”转换为8的逻辑 - 也许你可以解释一下?也许这是一个错字?