如何在数据集中的R studio中用数字替换单词

时间:2017-06-20 14:52:36

标签: r rstudio

我在这里看到了类似于我自己的问题:"将特定列字替换为数字或空白"但没有一个解决方案似乎对我的情况有所帮助。

我尝试做的是转换:

Question    Response
1           Sometimes
2           Almost Always
3           Sometimes
4           Almost Never
5           Often

进入:

Question    Response
    1           2
    2           4
    3           2
    4           1
    5           3

几乎从不= 1,有时= 2,经常= 3,几乎总是= 4。

我通过Excel导入数据,它位于名为STAI22的数据框中(我认为)。

我试过了:

STAI22[STAI22$Response == "Almost never",]$Response = 1
STAI22[STAI22$Response == "sometimes",]$Response = 2
STAI22[STAI22$Response == "often",]$Response = 3
STAI22[STAI22$Response == "Almost always",]$Response = 4

但是我收到了错误消息:

 STAI22[STAI22$Response == "Almost Always",]$Response = "4"
Warning message:
In `[<-.factor`(`*tmp*`, iseq, value = "4") :
  invalid factor level, NA generated
> STAI22[STAI22$Response == "Often",]$Response = "3"
Error in `[<-.data.frame`(`*tmp*`, STAI22$Response == "Often", , value = list( : 
  missing values are not allowed in subscripted assignments of data frames
> STAI22[STAI22$Response == "Sometimes",]$Response = "2"
Error in `[<-.data.frame`(`*tmp*`, STAI22$Response == "Sometimes", , value = list( : 
  missing values are not allowed in subscripted assignments of data frames
> STAI22[STAI22$Response == "Almost Never",]$Response = "1"
Error in `[<-.data.frame`(`*tmp*`, STAI22$Response == "Almost Never",  : 
  missing values are not allowed in subscripted assignments of data frames

它对我的数据没有任何作用!

2 个答案:

答案 0 :(得分:1)

您可以使用case_when中的dplyr

dplyr版本0.5.0

df <- read.table(text="Question    Response
1           Sometimes
2           'Almost Always'
3           Sometimes
4           'Almost Never'
5           Often",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
df%>%
  mutate(Response=case_when(
    .$Response=="Sometimes" ~ 2,
    .$Response=="Almost Always" ~ 4,
    .$Response=="Almost Never" ~ 1,
    .$Response=="Often" ~ 3
      ))
  Question Response
1        1        2
2        2        4
3        3        2
4        4        1
5        5        3

dplyr版本0.7.0

df <- read.table(text="Question    Response
1           Sometimes
2           'Almost Always'
3           Sometimes
4           'Almost Never'
5           Often",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
df%>%
  mutate(Response=case_when(
    Response=="Sometimes" ~ 2,
    Response=="Almost Always" ~ 4,
    Response=="Almost Never" ~ 1,
    Response=="Often" ~ 3
      ))

答案 1 :(得分:0)

YES!通过几个不同的答案,我终于设法做到了(为了那些和我一样垃圾的人,我会对我所做的做出一个荒谬简化的解释):

我从一个数据框开始:

Question    Response
1           Somewhat
2           Very much so
3           Somewhat
4           Not at all
5           Moderately so

我创建了一个查找表:

lookup <- c("Not at all" = 1, "Somewhat" = 2, "Moderately so" = 3, "Very much so" = 4)

为我的数据集创建了一个新列:

Datasetname["Response2"] <- NA #Just fills the column with NA

Question    Response         Response2
1           Somewhat            NA
2           Very much so        NA
3           Somewhat            NA
4           Not at all          NA
5           Moderately so       NA

然后将新值添加到该新列:

Datasetname$Response2 <- Datasetname[STAI$Response]

Question    Response            Response2
1           Somewhat            2
2           Very much so        4
3           Somewhat            2
4           Not at all          1
5           Moderately so       3

万岁!

感谢大家的建议 - 这种方式是唯一一个因某些原因对我有用的方式(我可能误解了一些建议)