在R中使用within()函数时遇到了一些意外的行为。我(最终!)跟踪原因是数据框中特定列的最后一个元素包含NA的情况。
我简化了代码以创建可重现的示例。显然,我遇到这种情况的真实世界应用程序要复杂得多(数据帧> 500k行400列,在()内部> 100行,等等,并且相当不方便避免在()内使用。
这可以按预期工作:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest$Bearing <- NA
fooTest$Bearing[which(fooTest$Group=="Cup")] <-
as.character(fooTest$CupComposition[which(fooTest$Group=="Cup")])
fooTest$Bearing[which(fooTest$Group=="Shell")] <-
as.character(fooTest$LinerComposition[which(fooTest$Group=="Shell")])
fooTest$Bearing
然而这(应该是等效的)会引发错误:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
Bearing <- NA
Bearing[which(Group=="Cup")] <-
as.character(CupComposition[which(Group=="Cup")])
Bearing[which(Group=="Shell")] <-
as.character(LinerComposition[which(Group=="Shell")])
})
错误消息是
[<-.data.frame
中出错(*tmp*
,nl,value = list(Bearing = c(&#34; Polyethylene&#34;,:
替换元件1有3行,需要5个
最后两行,其中Group是NA,显然不包括在内。数据中间的NA行正常。
几个问题:
within()的行为有点出乎意料;这是一个错误吗?我不是很有经验,所以我对提交错误有点保持谨慎,这可能是我的理解不足之处!
在这种特殊情况下,我希望有一种更简洁的方式来填充&#34;轴承&#34;列比我采用的方法。建议欢迎!
答案 0 :(得分:0)
关于使用within
的错误消息,您可以尝试:
within(fooTest, {Bearing <- NA
Bearing[Group=='Cup' & !is.na(Group)] <-
as.character(CupComposition)[Group=='Cup' & !is.na(Group)]
Bearing[Group=='Shell' & !is.na(Group)] <-
as.character(LinerComposition)[Group=='Shell' & !is.na(Group)]
})
目前尚不清楚Group
列和所有其他列是否遵循某种顺序。从列名称中,我找不到有助于匹配Group
中元素的常用模式。根据提供的示例,您也可以(对于更大的数据集)
fooTest1 <- fooTest
fooTest1[] <- lapply(fooTest1, as.character)#convert the columns to character class
Un1 <- sort(unique(na.omit(fooTest1$Group)))
m1 <- do.call(cbind,Map(function(v, x,y)
ifelse(v==y & !is.na(v), x, NA) , list(fooTest1[,1]),
fooTest1[,-1], Un1))
indx1 <- which(!is.na(m1), arr.ind=TRUE)[,1]
fooTest1$Bearing <- NA
fooTest1$Bearing[indx1] <- m1[!is.na(m1)]
fooTest1
# Group CupComposition LinerComposition Bearing
#1 Shell Metal Polyethylene Polyethylene
#2 <NA> <NA> <NA> <NA>
#3 Cup Polyethylene <NA> Polyethylene
#4 <NA> <NA> <NA> <NA>
#5 <NA> Test Test <NA>
答案 1 :(得分:0)
我倾向于使用&#34;%in%&#34;在这种情况下;它更好地处理NAs:
fooTest <- data.frame(Group = c("Shell", NA, "Cup", NA, NA),
CupComposition = c("Metal", NA, "Polyethylene", NA, "Test"),
LinerComposition = c("Polyethylene", NA, NA, NA, "Test"))
fooTest <- within(fooTest, {
Bearing <- NA
Bearing[Group %in% "Cup"] <-
as.character(CupComposition[Group %in% "Cup"])
Bearing[Group %in% "Shell"] <-
as.character(LinerComposition[Group %in% "Shell"])
})