附图(曼哈顿图)在x轴上包含来自基因组和Y轴-log(p)的染色体位置,其中p是与来自该特定位置的点(变体)相关的p值。
我使用以下R代码生成它(来自gap包):
require(gap)
affy <-c(40220, 41400, 33801, 32334, 32056, 31470, 25835, 27457, 22864, 28501, 26273,
24954, 19188, 15721, 14356, 15309, 11281, 14881, 6399, 12400, 7125, 6207)
CM <- cumsum(affy)
n.markers <- sum(affy)
n.chr <- length(affy)
test <- data.frame(chr=rep(1:n.chr,affy),pos=1:n.markers,p=runif(n.markers))
oldpar <- par()
par(cex=0.6)
colors <- c("red","blue","green","cyan","yellow","gray","magenta","red","blue","green", "cyan","yellow","gray","magenta","red","blue","green","cyan","yellow","gray","magenta","red")
mhtplot(test,control=mht.control(colors=colors),pch=19,bg=colors)
> head(test)
chr pos p
1 1 1 0.79296584
2 1 2 0.96675136
3 1 3 0.43870076
4 1 4 0.79825513
5 1 5 0.87554143
6 1 6 0.01207523
我感兴趣的是将图的峰值坐标高于某个阈值(-log(p))。
答案 0 :(得分:1)
如果您希望值的指数高于第99个百分位数:
# Add new column with log values
test = transform(test, log_p = -log10(test[["p"]]))
# Get the 99th percentile
pct99 = quantile(test[["log_p"]], 0.99)
...并从原始数据test
中获取值:
peaks = test[test[["log_p"]] > pct99,]
> head(peaks)
chr pos p log_p
5 1 5 0.002798126 2.553133
135 1 135 0.003077302 2.511830
211 1 211 0.003174833 2.498279
586 1 586 0.005766859 2.239061
598 1 598 0.008864987 2.052322
790 1 790 0.001284629 2.891222
您可以将此用于任何阈值。请注意,我没有计算出一阶导数,请参阅此问题以获得一些指示:
How to calculate first derivative of time series
计算一阶导数后,您可以通过查看一阶导数(几乎)为零的时间序列中的点来找到峰值。确定这些峰值后,您可以检查哪些峰值高于阈值。
答案 1 :(得分:0)
根据我绘制图形后的经验,您可以使用以下R代码来查找峰坐标
plot(x [,1],x [,2])
识别(x [,1],x [,2],标签= row.names(x))
coords&lt; - locator(type =“l”)
COORDS