Question

在我希望值设为> 900,000行的数据框中创建新列

NA，如果一组列中的 any 值是NA

df$newcol[is.na(df$somecol_1) | is.na(df$somecol_2) | is.na(df$somecol_3)] <- NA

0，如果一组列中的 all 个值均为0

df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==0] <- 0

1，如果一组列中的任个值是1，而没有一个列是NA。这是一个棘手的部分，因为它与我的十列创建了无数组合。整个数据框有> 50列，其中我对此过程感兴趣的有十列，这里我仅介绍三列：

df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==0] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==1 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==0 & df$somecol_2==0 & df$somecol_3==1] <- 1
df$newcol[df$somecol_1==1 & df$somecol_2==0 & df$somecol_3==1] <- 1

我觉得我想得太多了，必须有一种方法使3更容易？如上所示编写不同的列组合将永远花费十个时间。而且由于数据集很大，循环会太慢。

虚拟数据：

df <- NULL
df$somecol_1 <- c(1,0,0,NA,0,1,0,NA,1,1)
df$somecol_2 <- c(NA,1,0,0,0,1,0,NA,0,0)
df$somecol_3 <- c(0,0,0,0,0,0,0,0,0,0)
df <- as.data.frame(df)

基于上述内容，我希望新列为

df$newcol <- c(NA,1,0,NA,0,1,0,NA,1,1)

Answer 1

我们可以使用rowSums

nm1 <- grep('somecol', names(df))
df$newcol <- NA^(rowSums(is.na(df[nm1])) > 0) *(rowSums(df[nm1], na.rm = TRUE) > 0)
df$newcol
#[1] NA  1  0 NA  0  1  0 NA  1  1

数据

df <- structure(list(somecol_1 = c(1, 0, 0, NA, 0, 1, 0, NA, 1, 1), 
    somecol_2 = c(NA, 1, 0, 0, 0, 1, 0, NA, 0, 0), somecol_3 = c(0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0)), class = "data.frame", row.names = c(NA, 
-10L))

Answer 2

df$newcol = ifelse(apply(df,1,sum)>=1,1,0)

应该做到这一点。首先，apply将对每一行求和，然后：只要一行中有NA个值（第一种情况），任何操作都会返回NA；当只有0（第二种情况）时，总和为0（女巫不是>=1)，而ifelse第三个参数使新条目为0；并且当至少有一个1时，总和等于或大于大于1，ifelse第二个参数输入新的条目1。

编辑：由于您只想在某些列中运行这些条件-例如其第1-7、9和23-24列-您只能在df的该部分中使用代码：

df$newcol = as.numeric(rowSums(df[,c(1:7,9,23:24)])>=1)

OBS：我使用了Akrun和Gregor回答的简化代码。

如果您愿意，可以通过以下方式选择名称：Extracting specific columns from a data frame

Answer 3

as.numeric(rowSums(df) >= 1) 
#[1] NA  1  0 NA  0  1  0 NA  1  1

如果缺少任何值，

rowSums将给出NA。如果所有值均为0，则它将为0；否则，将为1（假设您的数据全为NA，0或1）。

（使用akrun的示例数据）

如何根据其他列中的值将值分配给R中的新列

3 个答案:

数据