在R中重新编码变量时出错

时间:2015-04-19 10:34:20

标签: r excel statistics

我已经从Excel导入了一个文件,该文件在导入后具有以下str

str(mydata)
$ Injury   : chr  "MMCAI" "MMCAI" "MMCAI" "MMCAI" ...
$ Na_RR    : num  161 152 152 150 143 ...
$ place    : chr  "core" "core" "core" "core" ...

现在我要创建5个不同的组合,组合变种“伤害”和“地方” 我有这个代码

mydata$group[mydata$Injury=="MMCAI" & mydata$place=="core"]<- "IC"

然而,在通过代码后,我得到了被归类为NA的观察结果 即:

 231    core    MMCAI   138.8168    3.253879    core    IC
 232    core    MMCAI   142.7655    3.096850    core    NA
 233    core    MMCAI   141.1135    3.066894    core    NA
 234    core    MMCAI   137.1993    2.922434    core    NA
 235    core    MMCAI   138.3765    2.848378    core    NA

我找不到错误...... 任何帮助将不胜感激

由于

1 个答案:

答案 0 :(得分:0)

如果相关变量存在前导/滞后空间,则可能发生这种情况

 mydata$group[with(mydata, Injury=='MMCAI' & place=='core')] <- 'IC'
 mydata
 #   Na_RR place  Injury group
 #1   231  core   MMCAI    IC
 #2   232  core  MMCAI   <NA>
 #3   233  core   MMCAI  <NA>
 #4   234  core  MMCAI   <NA>
 #5   235  core   MMCAI  <NA>
 #6   239  core   MMCPI  <NA>

当我们删除前导/滞后空格时,它应该可以工作

library(stringr)
mydata[c('place', 'Injury')] <- lapply(mydata[c('place', 'Injury')], str_trim)
mydata$group[with(mydata, Injury=='MMCAI' & place=='core')] <- 'IC'
mydata
#  Na_RR place Injury group
#1   231  core  MMCAI    IC
#2   232  core  MMCAI    IC
#3   233  core  MMCAI    IC
#4   234  core  MMCAI    IC
#5   235  core  MMCAI    IC
#6   239  core  MMCPI  <NA>

其他选项是在不删除空格的情况下使用grep

数据

mydata <- structure(list(Na_RR = c(231L, 232L, 233L, 234L, 235L,
 239L), 
place = c("core", "core", "core", "core", "core", "core"), 
Injury = c("MMCAI", "MMCAI ", " MMCAI", " MMCAI ", " MMCAI", 
"MMCPI")), .Names = c("Na_RR", "place", "Injury"),
row.names = c(NA,-6L), class = "data.frame")