Question

我有一个数据框，我想根据之前列中的记录创建一个0/1的新列（代表一个物种的缺席/存在）。我一直在尝试这个：

update_cat$bobpresent <- NA #creating the new column

x <- c("update_cat$bob1999", "update_cat$bob2000", "update_cat$bob2001","update_cat$bob2002", "update_cat$bob2003", "update_cat$bob2004", "update_cat$bob2005", "update_cat$bob2006","update_cat$bob2007", "update_cat$bob2008", "update_cat$bob2009") #these are the names of the columns I want the new column to base its results in

bobpresent <- function(x){
  if(x==NA)
    return(0)
  else
    return(1)
} # if all the previous columns are NA then the new column should be 0, otherwise it should be 1

update_cat $ bobpresence＆lt; - sapply（update_cat $ bobpresent，bobpresent）#apply the new column

直到我收到此错误的最后一个字符串，所有内容都会进入fina：

Error in if (x == NA) return(0) else return(1) : 
  missing value where TRUE/FALSE needed

有人可以告诉我吗？非常感谢您的帮助。

Answer 1

根据定义，NA上的所有操作都会产生NA，因此x == NA 总是评估为NA。如果要检查值是否为NA，则必须使用is.na函数，例如：

> NA == NA
[1] NA
> is.na(NA)
[1] TRUE

传递给sapply的函数需要TRUE或FALSE作为返回值，但它会获得NA，因此会出现错误消息。您可以通过重写您的函数来解决这个问题：

bobpresent <- function(x) { ifelse(is.na(x), 0, 1) }

在任何情况下，根据您的原始帖子，我不明白您要做什么。此更改仅修复了sapply带来的错误，但修复程序的逻辑是另一回事，并且您的帖子中没有足够的信息。

使用基于多个列的二进制数据创建新列

1 个答案: