R in()函数:当最后一个值为NA时出现意外错误

时间:2014-11-01 17:51:11

标签: r

在R中使用within()函数时遇到了一些意外的行为。我(最终!)跟踪原因是数据框中特定列的最后一个元素包含NA的情况。

我简化了代码以创建可重现的示例。显然,我遇到这种情况的真实世界应用程序要复杂得多(数据帧> 500k行400列,在()内部> 100行,等等,并且相当不方便避免在()内使用。

这可以按预期工作:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest$Bearing <- NA
fooTest$Bearing[which(fooTest$Group=="Cup")] <-
  as.character(fooTest$CupComposition[which(fooTest$Group=="Cup")])
fooTest$Bearing[which(fooTest$Group=="Shell")] <-
  as.character(fooTest$LinerComposition[which(fooTest$Group=="Shell")])
fooTest$Bearing

然而这(应该是等效的)会引发错误:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[which(Group=="Cup")] <-
    as.character(CupComposition[which(Group=="Cup")])
  Bearing[which(Group=="Shell")] <-
    as.character(LinerComposition[which(Group=="Shell")])
})

错误消息是 [<-.data.frame中出错(*tmp*,nl,value = list(Bearing = c(&#34; Polyethylene&#34;,:   替换元件1有3行,需要5个

最后两行,其中Group是NA,显然不包括在内。数据中间的NA行正常。

几个问题:

  1. within()的行为有点出乎意料;这是一个错误吗?我不是很有经验,所以我对提交错误有点保持谨慎,这可能是我的理解不足之处!

  2. 在这种特殊情况下,我希望有一种更简洁的方式来填充&#34;轴承&#34;列比我采用的方法。建议欢迎!

2 个答案:

答案 0 :(得分:0)

关于使用within的错误消息,您可以尝试:

 within(fooTest, {Bearing <- NA
      Bearing[Group=='Cup' & !is.na(Group)] <- 
           as.character(CupComposition)[Group=='Cup' & !is.na(Group)]
     Bearing[Group=='Shell' & !is.na(Group)] <- 
           as.character(LinerComposition)[Group=='Shell' & !is.na(Group)]
  })

目前尚不清楚Group列和所有其他列是否遵循某种顺序。从列名称中,我找不到有助于匹配Group中元素的常用模式。根据提供的示例,您也可以(对于更大的数据集)

 fooTest1 <- fooTest
 fooTest1[] <- lapply(fooTest1, as.character)#convert the columns to character class
 Un1 <- sort(unique(na.omit(fooTest1$Group)))


 m1 <-  do.call(cbind,Map(function(v, x,y)
              ifelse(v==y & !is.na(v), x, NA) , list(fooTest1[,1]),
                                       fooTest1[,-1], Un1))

 indx1 <- which(!is.na(m1), arr.ind=TRUE)[,1]
 fooTest1$Bearing <- NA
 fooTest1$Bearing[indx1] <- m1[!is.na(m1)]
 fooTest1
 #   Group CupComposition LinerComposition      Bearing
 #1 Shell          Metal     Polyethylene Polyethylene
 #2  <NA>           <NA>             <NA>         <NA>
 #3   Cup   Polyethylene             <NA> Polyethylene
 #4  <NA>           <NA>             <NA>         <NA>
 #5  <NA>           Test             Test         <NA>

答案 1 :(得分:0)

我倾向于使用&#34;%in%&#34;在这种情况下;它更好地处理NAs:

fooTest <- data.frame(Group = c("Shell", NA,  "Cup", NA,  NA),
                      CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
                      LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
  Bearing <- NA
  Bearing[Group %in% "Cup"] <-
    as.character(CupComposition[Group %in% "Cup"])
  Bearing[Group %in% "Shell"] <-
    as.character(LinerComposition[Group %in% "Shell"])
})