Question

我正在尝试将数据集中“同意/不同意”标度的所有变量重新编码为数值。我试过使用mutate_all和case_when，但随后它会返回ID列和var3（以下数据）等变量的NA值。这是我正在使用的代码：

newdat <- olddat %>% mutate_all(funs(case_when(. == "Strongly Disagree (1)" ~ 1,
                                               . == "Disagree (2)" ~ 2,
                                               . == "Neutral (3)" ~ 3,
                                               . == "Agree (4)" ~ 4,
                                               . == "Strongly Agree (5)" ~ 5)))

我想发生的事情如下：

有数据

id     var1                      var2           var3      var4
 1     Strongly Disagree (1)     Agree (4)      5         Agree (4)
 2     Strongly Disagree (1)     Neutral (3)    6         Neutral (3)
 3     Disagree (2)              Neutral (3)    4         Strongly Agree (5)
 4     Strongly Disagree (1)     Agree (4)      9         Disagree (2)
 5     Neutral (3)               Agree (4)      2         Agree (4)

想要的数据

id     var1   var2   var3   var4
 1     1      4      5      4
 2     1      3      6      3
 3     2      3      4      5
 4     1      4      9      2
 5     3      4      2      4

P.S。试图寻找一个现有的答案，但我找不到！也许我说错了什么？

Answer 1

您可以简单地从每个单元格中提取数字代码，因为您已经在括号中添加了它。无需recode。这是使用stringr::str_extract()-

的方法

have %>% 
  mutate_at(vars(starts_with("var")), ~as.integer(str_extract(x, "[0-9]")))

Answer 2

您需要使用mutate_at而不是mutate_all，因为您只想更改选定的列，因为默认情况下，case_when中不匹配的值将变为NA

library(dplyr)

df %>% mutate_at(vars(var1, var2, var4), 
                     ~(case_when(. == "Strongly Disagree (1)" ~ 1,
                                 . == "Disagree (2)" ~ 2,
                                 . == "Neutral (3)" ~ 3,
                                 . == "Agree (4)" ~ 4,
                                 . == "Strongly Agree (5)" ~ 5)))

#  id var1 var2 var3 var4
#1  1    1    4    5    4
#2  2    1    3    6    3
#3  3    2    3    4    5
#4  4    1    4    9    2
#5  5    3    4    2    4

由于有许多列要执行此操作，因此我们首先可以找出需要更改的列，然后使用mutate_at

cols <- which(colSums(sapply(df, grepl, pattern =  "Agree|Disagree")) > 0)

df %>%
    mutate_at(cols, ~case_when(. == "Strongly Disagree (1)" ~ 1,
                    . == "Disagree (2)" ~ 2,
                    . == "Neutral (3)" ~ 3,
                    . == "Agree (4)" ~ 4,
                    . == "Strongly Agree (5)" ~ 5))

Answer 3

这看起来很难看，我相信有更简单的解决方案，但是应该可以解决：

newdat <- as.data.frame(sapply(1:ncol(olddat), function(x){if(x %in% c(1,4)){return(olddat[x])}else{return(sapply(olddat[x], function(y){as.numeric(gsub("[()]","",strsplit(y, split = " ")[[1]][2]))}))}}))

它的作用基本上是遍历每一列。如果是第一列或第四列，则按原样返回该列。如果有其他情况：用strsplit()在空白处分割每个单元格，然后取下半部分，用gsub()除去方括号，然后用as.numeric()将其转换为数字。

修改：

如果您有很多列，并且不想手动指定它们，则可以按列类进行过滤：

newdat <- as.data.frame(sapply(1:ncol(olddat), function(x){if(class(x) == "integer"){return(olddat[x])}else{return(sapply(olddat[x], function(y){as.numeric(gsub("[()]","",strsplit(y, split = " ")[[1]][2]))}))}}))

根据条件重新编码所有变量

3 个答案: