我的问题似乎与this thread有关。
但是,那里给出的方法对我不起作用。
我将数据集中的矢量定义为:eduyears1994 <- year1994$q131ed
并收到一个看起来像这样的矢量:
[1] 17 lat/9 1O lat/3,4 1O lat/3,4 17 lat/9 17 lat/9 12 lat/5,6
1O lat/3,4 1O lat/3,4 12 lat/5,6
9 Levels: Brak formal wykszta³cenia 4 lata/1 8 lat/2 1O lat/3,4 12 lat/5,6
14 lat/7,8 ... BRAK DANYCH
例如“10 lat”代表10年(教育),“/ 3,4”代表因子标签。
我只想拥有一个数字变量,例如专栏中的“10”而不是“10年”。
我尝试过以下操作并收到以下错误消息:
eduyears1994n&lt; - as.numeric(as.character(eduyears1994))
警告信息:
强制引入的NA
我也尝试手动完成:
eduyears1994[eduyears1994== "4 lata/1"] <- 4
eduyears1994[eduyears1994== "2"] <- 8
eduyears1994[eduyears1994== "17 lat"] <- 17
但错误信息显示为:
在[&.-。因子( tmp ,eduyears1994 ==“9”,值= 17)中:
无效因子水平,NA生成
当我用SPSS打开文件时,我看到的是数字,而不是标签,但是数据格式被指定为标称,这可能是问题的原因。
dput(eduyears1994)
c("17 lat/9", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "12 lat/5,6",
"12 lat/5,6", "17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4",
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "1O lat/3,4",
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "17 lat/9", "17 lat/9",
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6",
"8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "17 lat/9",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"17 lat/9", "8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4",
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "14 lat/7,8",
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "17 lat/9",
"12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6",
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"17 lat/9", "8 lat/2", "17 lat/9", "17 lat/9", "12 lat/5,6",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4",
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"4 lata/1", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6",
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6",
"17 lat/9", "17 lat/9", "17 lat/9", "1O lat/3,4", "17 lat/9",
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "8 lat/2", "17 lat/9", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4",
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "8 lat/2", "1O lat/3,4",
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "17 lat/9", "12 lat/5,6",
"12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8",
"8 lat/2", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2",
"4 lata/1", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"1O lat/3,4", "8 lat/2", "8 lat/2", "14 lat/7,8", "12 lat/5,6",
"8 lat/2", "8 lat/2", "14 lat/7,8", "8 lat/2", "14 lat/7,8",
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9",
"8 lat/2", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4",
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4",
"4 lata/1", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "17 lat/9",
"17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"1O lat/3,4", "8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"12 lat/5,6", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6",
"1O lat/3,4", "17 lat/9", "17 lat/9", "8 lat/2", "12 lat/5,6",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "17 lat/9",
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4",
"17 lat/9", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2", "17 lat/9",
"1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "8 lat/2",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6",
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "17 lat/9", "17 lat/9",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9",
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "17 lat/9",
"1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "12 lat/5,6",
"12 lat/5,6", "14 lat/7,8", "8 lat/2", "14 lat/7,8", "1O lat/3,4",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2", "12 lat/5,6",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6",
"12 lat/5,6", "14 lat/7,8", "12 lat/5,6", "14 lat/7,8", "17 lat/9",
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8",
"12 lat/5,6", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"8 lat/2", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"14 lat/7,8", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2",
"17 lat/9", "17 lat/9", "8 lat/2", "14 lat/7,8", "1O lat/3,4",
"8 lat/2", "17 lat/9", "17 lat/9", "17 lat/9", "12 lat/5,6",
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "17 lat/9", "1O lat/3,4",
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2",
"8 lat/2", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "8 lat/2",
"17 lat/9", "17 lat/9", "14 lat/7,8", "17 lat/9", "1O lat/3,4",
"17 lat/9", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9",
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "17 lat/9", "17 lat/9", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "12 lat/5,6", "8 lat/2", "12 lat/5,6", "8 lat/2",
"8 lat/2", "14 lat/7,8", "8 lat/2", "17 lat/9", "12 lat/5,6",
"1O lat/3,4", "14 lat/7,8", "17 lat/9", "1O lat/3,4", "17 lat/9",
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "8 lat/2",
"17 lat/9", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"8 lat/2", "8 lat/2", "1O lat/3,4", "14 lat/7,8", "1O lat/3,4",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"14 lat/7,8", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "12 lat/5,6",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "17 lat/9", "12 lat/5,6",
"8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"17 lat/9", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "12 lat/5,6", "17 lat/9", "1O lat/3,4", "17 lat/9",
"17 lat/9", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4",
"17 lat/9", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "14 lat/7,8", "8 lat/2",
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "8 lat/2", "1O lat/3,4",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "17 lat/9", "8 lat/2", "1O lat/3,4", "17 lat/9",
"17 lat/9", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2",
"1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"8 lat/2", "12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "1O lat/3,4",
"12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "12 lat/5,6", "1O lat/3,4",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "14 lat/7,8", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "17 lat/9", "12 lat/5,6",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "17 lat/9",
"12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4", "8 lat/2",
"12 lat/5,6", "8 lat/2", "17 lat/9", "8 lat/2", "12 lat/5,6",
"1O lat/3,4", "17 lat/9", "1O lat/3,4", "17 lat/9", "12 lat/5,6",
"14 lat/7,8", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4",
"8 lat/2", "8 lat/2", "8 lat/2", "4 lata/1", "12 lat/5,6", "17 lat/9",
"12 lat/5,6", "17 lat/9", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4",
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4",
"8 lat/2", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6",
"12 lat/5,6", "8 lat/2", "12 lat/5,6", "1O lat/3,4", "8 lat/2",
"8 lat/2", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "14 lat/7,8",
"12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4",
"12 lat/5,6", "17 lat/9", "17 lat/9", "12 lat/5,6", "1O lat/3,4",
"17 lat/9", "1O lat/3,4", "12 lat/5,6", "12 lat/5,6", "12 lat/5,6",
"1O lat/3,4", "1O lat/3,4", "8 lat/2", "1O lat/3,4", "17 lat/9",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "12 lat/5,6", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "12 lat/5,6", "8 lat/2", "1O lat/3,4",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "14 lat/7,8", "12 lat/5,6",
"14 lat/7,8", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4",
"1O lat/3,4", "1O lat/3,4", "17 lat/9", "17 lat/9", "1O lat/3,4",
"8 lat/2", "1O lat/3,4", "1O lat/3,4", "8 lat/2", "8 lat/2",
"12 lat/5,6", "12 lat/5,6", "14 lat/7,8", "14 lat/7,8", "1O lat/3,4",
"17 lat/9", "17 lat/9", "12 lat/5,6", "12 lat/5,6", "1O lat/3,4",
"1O lat/3,4", "8 lat/2", "1O lat/3,4", "12 lat/5,6", "8 lat/2",
"1O lat/3,4", "12 lat/5,6", "1O lat/3,4", "1O lat/3,4", "1O lat/3,4",
"12 lat/5,6", "12 lat/5,6", "12 lat/5,6", "8 lat/2", "1O lat/3,4",
"1O lat/3,4", "8 lat/2", "12 lat/5,6", "8 lat/2", "1O lat/3,4",
"12 lat/5,6", "8 lat/2", "1O lat/3,4", "1O lat/3,4", "12 lat/5,6",
"14 lat/7,8", "1O lat/3,4", "17 lat/9", "1O lat/3,4", "1O lat/3,4"
)
答案 0 :(得分:2)
使用您的实际数据,您可能会看到一般格式的字符向量
n lat/a,b
其中n
是年份,&#34; a,b&#34;是某种标签。这将提取岁月。
vec <- c("17 lat/9","10 lat/3,4","10 lat/3,4","17 lat/9","17 lat/9","12 lat/5,6","10 lat/3,4","10 lat/3,4","12 lat/5,6")
x <- strsplit(vec,split=" lat/",fixed=TRUE)
sapply(x,function(x)as.integer(x[1]))
# [1] 17 10 10 17 17 12 10 10 12
答案 1 :(得分:1)
你可以尝试
c(17,8,4)[as.numeric(eduyears1994)]
#[1] 17 4 17 4 17 17 4 4 17 17 17 17 4 8 4 4 8 4 8 8
或
unname(c('4 lata/1'=4, '2'=8, '17 lat' =17)[as.character(eduyears1994)])
#[1] 17 4 17 4 17 17 4 4 17 17 17 17 4 8 4 4 8 4 8 8
如果8
实际上是typo
,则可以使用
library(stringi)
as.numeric(unlist(stri_extract_all_regex(eduyears1994, '^\\d+')))
#[1] 17 4 17 4 17 17 4 4 17 17 17 17 4 2 4 4 2 4 2 2
set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))
答案 2 :(得分:1)
使用@ akrun的例子:
set.seed(21)
eduyears1994 <- factor(sample(c('4 lata/1', 2, '17 lat'), 20, replace=TRUE))
使用gsub
和(显然)正确的正则表达式(*
表示“0或更多前面的字符或模式”,所以例如"lata*"
匹配“lat”或“lata” “)
as.numeric(gsub(" lata*[/0-9,]*","",eduyears1994))
警告:这会将“2”转换为2而不是8,这不是您要求的。我不太确定将“4 lata / 1”转换为4,“17 lat”转换为17,将“2”转换为8的逻辑 - 也许你可以解释一下?也许这是一个错字?