我正在尝试将数据集中“同意/不同意”标度的所有变量重新编码为数值。我试过使用mutate_all和case_when,但随后它会返回ID列和var3(以下数据)等变量的NA值。这是我正在使用的代码:
newdat <- olddat %>% mutate_all(funs(case_when(. == "Strongly Disagree (1)" ~ 1,
. == "Disagree (2)" ~ 2,
. == "Neutral (3)" ~ 3,
. == "Agree (4)" ~ 4,
. == "Strongly Agree (5)" ~ 5)))
我想发生的事情如下:
有数据
id var1 var2 var3 var4
1 Strongly Disagree (1) Agree (4) 5 Agree (4)
2 Strongly Disagree (1) Neutral (3) 6 Neutral (3)
3 Disagree (2) Neutral (3) 4 Strongly Agree (5)
4 Strongly Disagree (1) Agree (4) 9 Disagree (2)
5 Neutral (3) Agree (4) 2 Agree (4)
想要的数据
id var1 var2 var3 var4
1 1 4 5 4
2 1 3 6 3
3 2 3 4 5
4 1 4 9 2
5 3 4 2 4
P.S。 试图寻找一个现有的答案,但我找不到!也许我说错了什么?
答案 0 :(得分:4)
您可以简单地从每个单元格中提取数字代码,因为您已经在括号中添加了它。无需recode
。这是使用stringr::str_extract()
-
have %>%
mutate_at(vars(starts_with("var")), ~as.integer(str_extract(x, "[0-9]")))
答案 1 :(得分:3)
您需要使用mutate_at
而不是mutate_all
,因为您只想更改选定的列,因为默认情况下,case_when
中不匹配的值将变为NA
library(dplyr)
df %>% mutate_at(vars(var1, var2, var4),
~(case_when(. == "Strongly Disagree (1)" ~ 1,
. == "Disagree (2)" ~ 2,
. == "Neutral (3)" ~ 3,
. == "Agree (4)" ~ 4,
. == "Strongly Agree (5)" ~ 5)))
# id var1 var2 var3 var4
#1 1 1 4 5 4
#2 2 1 3 6 3
#3 3 2 3 4 5
#4 4 1 4 9 2
#5 5 3 4 2 4
由于有许多列要执行此操作,因此我们首先可以找出需要更改的列,然后使用mutate_at
cols <- which(colSums(sapply(df, grepl, pattern = "Agree|Disagree")) > 0)
df %>%
mutate_at(cols, ~case_when(. == "Strongly Disagree (1)" ~ 1,
. == "Disagree (2)" ~ 2,
. == "Neutral (3)" ~ 3,
. == "Agree (4)" ~ 4,
. == "Strongly Agree (5)" ~ 5))
答案 2 :(得分:1)
这看起来很难看,我相信有更简单的解决方案,但是应该可以解决:
newdat <- as.data.frame(sapply(1:ncol(olddat), function(x){if(x %in% c(1,4)){return(olddat[x])}else{return(sapply(olddat[x], function(y){as.numeric(gsub("[()]","",strsplit(y, split = " ")[[1]][2]))}))}}))
它的作用基本上是遍历每一列。如果是第一列或第四列,则按原样返回该列。如果有其他情况:用strsplit()
在空白处分割每个单元格,然后取下半部分,用gsub()
除去方括号,然后用as.numeric()
将其转换为数字。
修改:
如果您有很多列,并且不想手动指定它们,则可以按列类进行过滤:
newdat <- as.data.frame(sapply(1:ncol(olddat), function(x){if(class(x) == "integer"){return(olddat[x])}else{return(sapply(olddat[x], function(y){as.numeric(gsub("[()]","",strsplit(y, split = " ")[[1]][2]))}))}}))