我有以下格式的数据,我正在尝试创建一个新变量,其中包括每个观察的总死亡人数,其中"存在"表示正在进行的活动:
Birth1 Death1 Birth2 Death2 Birth3 Death3 Birth4 Death4 Birth5 Death5 Birth6 Death6
1 1990 present
2 1984 1986 1986 present
3 1985 1988 1988 present
4 1987 1991 1991 1994 1996 present
5 1987 1989 1989 present
我试过了data$num.deaths <- ifelse(data$death1=="present", 0, 1)
,但显然没有得到有多个死亡事件的观察结果。我也试过嵌套ifelse但得到了相同的结果。任何人都能指出我这样快速有效的方法吗?
答案 0 :(得分:3)
提取表示给出Deaths
的死亡的列,然后在每一行中累加非NA的元素数,而不是空字符串且不等于"present"
。没有包使用。
Deaths <- data[grep("Death", names(data))]
rowSums(!is.na(Deaths) & Deaths != "" & Deaths != "present")
## A B C D
## 0 1 1 2
给出相同结果的最后一行代码的替代方法是检查每行的每个单元格是否有数字,并将每行中的成功相加 - apply
将行转换为列,这样我们就可以了使用colSums
来做到这一点。
colSums(apply(Deaths, 1, grepl, pattern = "\\d"))
注意:最好在您的问题中显示dput(data)
的输出,以明确且可重复地传达您的输入。如果没有它,你所拥有的和答案之间可能存在微小的差异,因此为了重现性,我们将其用作输入(对应于修改前问题中显示的原始输入数据和样本输出):
data <-
structure(list(Birth1 = c(1990L, 1984L, 1985L, 1987L), Death1 = c("present",
"1986", "1988", "1991"), Birth2 = c(NA, 1986L, 1988L, 1991L),
Death2 = c("", "present", "present", "1994"), Birth3 = c(NA,
NA, NA, 1996L), Death3 = c("", "", "", "present")), .Names = c("Birth1",
"Death1", "Birth2", "Death2", "Birth3", "Death3"),
class = "data.frame", row.names = c("A", "B", "C", "D"))
看起来像这样:
> data
Birth1 Death1 Birth2 Death2 Birth3 Death3
A 1990 present NA NA
B 1984 1986 1986 present NA
C 1985 1988 1988 present NA
D 1987 1991 1991 1994 1996 present
答案 1 :(得分:0)
Here is another option with Reduce
and +
. We loop through the columns with lapply
, convert the elements to binary (0/1
) by checking if the elements contain only number of not, and then with Reduce
we sum up the corresponding elements of each row.
Reduce(`+`,lapply(data[grep('Death', names(data))],
grepl, pattern='^\\d+$'))
#[1] 0 1 1 2
NOTE: The example is taken from the dput
output in @G. Grothendieck's post.