我有一个由两列组成的数据框:true.de.status
和decision.de
。数据集可重现如下:
dat = structure(c(0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,
0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0), .Dim = c(100L,
2L), .Dimnames = list(NULL, c("true.de.status", "decision.de"
)))
dat
的前几行是:
true.de.status decision.de
[1,] 0 0
[2,] 0 0
[3,] 1 1
[4,] 0 1
[5,] 1 0
[6,] 0 0
[7,] 1 1
[8,] 1 0
现在我希望用x轴绘制基因数量(即dat
中的总行数)和y轴真实阳性数的图。 x轴很容易确定:seq(0,100)
会给我0,1,......,100个基因。对于y轴,我需要根据两列true.de.status
和decision.de
计算:当我遍历每一行时,我可以计算真实的阳性随着基因(行)的数量增加。例如,
first 1 gene included: True positive (TP) = 0
first 2 genes included: TP = 0
first 3 genes included: TP = 1 (since both columns have 1 and they match)
first 4 genes included: TP = 1 (`decision.de` is 1, but `true.de.status` is 0, so it is a false positive)
first 5 genes included: TP = 1 (two columns don't match)
......
是否有一种简单的方法可以操纵dat
数据框,并返回与dim(dat)[1]
长度相同且长度为正数的向量?谢谢!
答案 0 :(得分:1)
看起来你想要
df <- as.data.frame(dat)
df$TP <- cumsum(as.numeric(df$true.de.status == 1 & df$decision.de == 1))
这将返回两列均为1且匹配的实例的累计计数。
答案 1 :(得分:1)
看看这是否是你想要的:
plot( cumsum( dat[ , "true.de.status"] == 1 &
dat[ , "decision.de"] == 1) ,
type="s")
(默认情况下,x值为1:100
。如果你想要点或线,你可以改变类型参数。显然你可以使用vec <- ...
将cumsum值分配给一个名字)