如何在数据帧的每一行上应用功能?

时间:2019-10-25 23:47:02

标签: r

我想使用这个包的metap 计算多个o值

我的数据框具有3个p值

    > dput(head(tt))
structure(list(RS = c("rs2089177", "rs4360974", "rs6502526", 
"rs8069906", "rs9905280", "rs4313843"), G = c(0.9986, 0.9738, 
0.9744, 0.7184, 0.7205, 0.9804), E = c(0.7153, 0.7838, 0.7839, 
0.4918, 0.4861, 0.8522), B = c(0.604716, 0.430228, 0.42916, 0.521452, 
0.465758, 0.474313)), class = c("data.table", "data.frame"), row.names  = c(NA, 
-6L), .internal.selfref = <pointer: 0x10200eee0>)

和数据帧,每个p值具有相应的权重 从tt数据帧开始

   > dput(head(df))
structure(list(wg = c(40.6324993078201, 40.6324993078201, 40.6324993078201, 
 40.6324993078201, 40.6324993078201, 40.6324993078201), we = c(35.3977400408557, 
35.3977400408557, 35.3977400408557, 35.3977400408557, 35.3977400408557, 
35.3977400408557), wb = c(580.643608420863, 580.643608420863, 
580.643608420863, 580.643608420863, 580.643608420863, 580.643608420863
), RS = c("rs2089177", "rs4360974", "rs6502526", "rs8069906", 
"rs9905280", "rs4313843")), row.names = c(NA, 6L), class = "data.frame")

在df和tt中,RS列相同

如何使用此sunz()函数创建一个新的数据框 看起来与tt相同,只不过它有附加的列,例如named “ META”已计算出每一行的meta p值

这是第一行中p值有多少的示例:

 > sumz(c(0.9986,0.7153,0.604716), weights = c(40.6325,35.39774,580.6436), na.action = na.fail)
p =  0.6940048

这是我指的功能: https://www.rdocumentation.org/packages/metap/versions/1.1/topics/sumz

我尝试合并这两个数据框并在每行上应用一个函数:

> head(q)
       ID         P         G       E       wb      wg       we
1:  rs1029830 0.0979931 0.0054060 0.39160 580.6436 40.6325 35.39774
2:  rs1029832 0.1501820 0.0028140 0.39320 580.6436 40.6325 35.39774
3: rs11078374 0.1701250 0.0009805 0.49730 580.6436 40.6325 35.39774
4:  rs1124961 0.1710150 0.7252000 0.05737 580.6436 40.6325 35.39774
5:  rs1135237 0.1493650 0.6851000 0.06354 580.6436 40.6325 35.39774
6: rs11867934 0.0757972 0.0006140 0.00327 580.6436 40.6325 35.39774


helper <- function(x) {
   p <- sumz(x[2:4], weights = x[5:7])$p
   p
}

q$META <- apply(q, MARGIN = 1, helper)

但我收到此错误:

 Error in sumz(x[2:4], weights = x[5:7]) : 
  Must have at least two valid p values 

1 个答案:

答案 0 :(得分:0)

首先,由于您说RS在两者之间是相同的,所以对我来说,这是“我们如何确定行始终正确对齐的警告?” 的警告。为防御起见,我会说“不是100%”,然后将它们合并/合并在一起,以便按正确的顺序保证它们。

quux <- tt[df, on="RS"]
quux
#           RS      G      E        B      wg       we       wb
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436

从这里开始,对于每行,它只是将行的每个部分与同一行的其他部分一起应用:

quux$META <- sapply(seq_len(nrow(quux)), function(rn) {
  unlist(sumz(as.matrix(quux[,.(G,E,B)])[rn,], weights = as.vector(quux[,.(wg,we,wb)])[rn,],
              na.action=na.fail)["p"])
})
quux
#           RS      G      E        B      wg       we       wb      META
# 1: rs2089177 0.9986 0.7153 0.604716 40.6325 35.39774 580.6436 0.9863582
# 2: rs4360974 0.9738 0.7838 0.430228 40.6325 35.39774 580.6436 0.9294546
# 3: rs6502526 0.9744 0.7839 0.429160 40.6325 35.39774 580.6436 0.9300445
# 4: rs8069906 0.7184 0.4918 0.521452 40.6325 35.39774 580.6436 0.6379392
# 5: rs9905280 0.7205 0.4861 0.465758 40.6325 35.39774 580.6436 0.6055061
# 6: rs4313843 0.9804 0.8522 0.474313 40.6325 35.39774 580.6436 0.9605584

或更像data.table中心的方式:

mysumz <- function(x, w) sumz(unlist(x), weights = unlist(w), na.action = na.fail)[["p"]]
quux[, META := mysumz(.(G,E,B), .(wg,we,wb)), by = seq_len(nrow(quux))]

(从https://stackoverflow.com/a/36802640借用)。要求使用辅助功能是因为对mysumzlist的每个对x的调用都有一个w,但是sumz需要向量。如果要验证这一点,请先调用debugonce(mysumz),然后运行quux[,META:=...]并检查xw ...及其工作方式。