我需要用多列数字替换字符串。下面是一个示例数据集:
x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA, "Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier", "Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier", "Approaching", "Approaching", "Emerging")
sam <- cbind(x,y,z)
我需要将“高/低异常值”转换为0,将NA保留为NA,将“新手”转换为1,将“新兴”转换为2,将“接近”转换为3,将“熟练”转换为4。 / p>
我尝试用转换单个变量
sam$x.r <- recode(sam$x.r,'Low Outlier'=0,'High Outlier'=0,'Novice'=1,'Emerging'=2,'Approaching'=3, 'Proficient'=4)
我收到一条错误消息“警告消息:
在recode.numeric(Dat17_18.1 $ I.E.ScoreStat中,Low Outlier
= 0,High Outlier
= 0 ,:
强制引入的NAs”
我不确定如何一次重新编码所有变量。
答案 0 :(得分:4)
只需这样做-
=IF(A2="First Line",C3,IF(A1="",C1,Match("First Line",A2:A500,0)))
答案 1 :(得分:2)
在这种情况下,我们可以使用case_when
中的dplyr
library(dplyr)
sam %>%
mutate_all(~case_when(. %in% c("Low Outlier", "High Outlier") ~ '0',
. == "Novice" ~ '1',
. == "Emerging" ~ '2',
. == "Approaching" ~ '3',
. == "Proficient" ~ '4',
TRUE ~ NA_character_))
# x y z
#1 0 1 0
#2 0 3 4
#3 1 4 3
#4 1 3 2
#5 2 0 0
#6 <NA> 4 3
#7 4 <NA> 3
#8 3 2 2
但是,由于原始列也是字符,因此最终输出中包含字符列。如果需要,我们可以添加mutate_all(as.numeric)
将其转换为数字。
数据
x <- c("Low Outlier", "High Outlier", "Novice", "Novice", "Emerging", NA,
"Proficient", "Approaching")
y <- c("Novice", "Approaching", "Proficient", "Approaching", "High Outlier",
"Proficient",NA, "Emerging")
z <- c("High Outlier", "Proficient", "Approaching", "Emerging", "Low Outlier",
"Approaching", "Approaching", "Emerging")
sam <- data.frame(x,y,z, stringsAsFactors = FALSE)
答案 2 :(得分:2)
真的很重复又很快。这是一个简单的函数:
my_replacer<-function(df,y,z){
df<-as.data.frame(apply(df,2,function(x) gsub(y,z,x)))
#y is what you want to replace
#z is the replacement
#This uses regex
df
}
my_replacer(sam,"Emerging.*","2")
这是我的用法:
library(dplyr)#can use ifelse. Still repetitive
sam<-as.data.frame(sam)
sam %>%
mutate_if(is.factor,as.character)->sam
my_replacer(sam,"Emerging.*","2")
结果:
x y z
1 Low Outlier Novice High Outlier
2 High Outlier Approaching Proficient
3 Novice Proficient Approaching
4 Novice Approaching 2
5 2 High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching 2 2
替换其他人
my_replacer(sam,"Novi.*","1")
x y z
1 Low Outlier 1 High Outlier
2 High Outlier Approaching Proficient
3 1 Proficient Approaching
4 1 Approaching Emerging
5 Emerging High Outlier Low Outlier
6 <NA> Proficient Approaching
7 Proficient <NA> Approaching
8 Approaching Emerging Emerging
答案 3 :(得分:1)
我将使用命名向量作为映射
library(dplyr)
mapping = c("High Outlier" = 0, "Low Outlier" = 0, "Novice" = 1, "Emerging" = 2, "Approaching" = 3, "Proficient" = 4)
sam %>%
as.data.frame() %>%
mutate_all(function(i) mapping[i])
答案 4 :(得分:0)
另一种使用factors
进行重新编码并使用approxfun
进行赋值的解决方案:
sam[] <- approxfun(1:5, c(0:3, 0))(
as.numeric(factor(sam,
c("Low Outlier", "Novice",
"Emerging", "Approaching",
"Proficient", "High Outlier"))))
# x y z
# [1,] "0" "1" NA
# [2,] NA "3" "0"
# [3,] "1" "0" "3"
# [4,] "1" "3" "2"
# [5,] "2" NA "0"
# [6,] NA "0" "3"
# [7,] "0" NA "3"
# [8,] "3" "2" "2"