Question

我有一个包含许多变量的数据框。这是我到目前为止的缩短版本：

n_20010_0_0 <- c(1,2,3,4)
n_20010_0_1 <- c(0, -2, NA, 4)  
n_20010_0_2 <- c(3, 0, -7, 2)

x <- data.frame (n_20010_0_0, n_20010_0_1, n_20010_0_2)

我创建了一个新变量，它返回变量列表中是否有1：

 MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")
 x$MotherCAD_0_0 <- apply(x, 1, function(x) as.integer(any(x[MotherIllness0] == 1, na.rm = TRUE)))

我想将NAs保持为0，但我还想重新编码，以便如果有-7则新值为NA。这是我尝试过的，它不起作用：

x$MotherCAD_0_0[MotherIllness0 == -7] <- NA

Answer 1

您无需定义MotherIllness0，apply函数中的参数1会处理此问题。

这是一行代码，可以完成你想要的任何事情。

MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")    
x$MotherCAD_0_0<- apply(x[,MotherIllness0], 1, function(x) ifelse(any(x==-7), NA,    
                                                     as.integer(any(x==1, na.rm=T))))

我认为同时包含1和-7的行应该为新变量设置NA。如果没有，那么这应该有效：

x$MotherCAD_0_0<- apply(x[,MotherIllness0], 1, function(x) ifelse(any(x==1, na.rm=T), 1,
                                                 ifelse(any(x==-7), NA, 0)))

请注意，使用上面的示例，这两行应该产生相同的结果。

Answer 2

这是另一种方法，不使用任何if-else逻辑：

# Here's your dataset, with a row including both 1 and -7 added:
x <- data.frame (n_20010_0_0 = c(1, 2, 3, 4, 1),
                 n_20010_0_1 = c(0, -2, NA, 4, 0) , 
                 n_20010_0_2 = c(3, 0, -7, 2, -7)
)

# Your original function:
MotherIllness0 <- paste("n_20010_0_", 0:2, sep = "")

x$MotherCAD_0_0 <- apply(x, MARGIN = 1, FUN = function(x) {
    as.integer(
        any(x[MotherIllness0] == 1, na.rm = TRUE)
    )
})

# A simplified version
x$test <- apply(x, MARGIN = 1, FUN = function(row) {

    as.integer(
        any(row[MotherIllness0] == 1, na.rm = TRUE) & 
        !any(row[MotherIllness0] == -7, na.rm = TRUE)
    )

})

一些注意事项：像x这样的匿名函数中function(x)的名称可以是任何东西，你可以通过调用它来为它自己节省很多混乱（我将它命名为上面row。

您实际上也不太可能需要将结果列转换为整数 - 逻辑列更容易解释，并且它们与0-1列的内容完全相同（例如，TRUE + FALSE等于1）

将NAs分配给列表中的值

2 个答案: