我最近开始研究单倍型数据,而且我正在搞乱来自1000个基因组项目的数据,并试图用R中的Pegas包来操纵它。到目前为止我来了到目前为止:
library(pegas)
a <- "ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/release/20130502"
b <- "ALL.chrY.phase3_integrated_v1b.20130502.genotypes.vcf.gz"
url <- paste(a, b, sep = "/")
download.file(url, "chrY.vcf.gz")
(info <- VCFloci("chrY.vcf.gz"))
SNP <- is.snp(info)
X.SNP <- read.vcf("chrY.vcf.gz", which.loci = which(SNP))
h <- haplotype(X.SNP, 6020:6030)
net <- haploNet(h)
plot(net)
我想绘制一个单倍型网,但它并没有执行它。我收到以下消息:&#39; h&#39;必须是班级&#39;单倍型&#39;
如果我打印出来,我会得到:
> h
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
. "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "T" "C" "C"
. "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "A" "G"
. "C" "C" "C" "C" "C" "C" "T" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C" "C"
. "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "C" "T" "T" "T" "T" "T" "T" "T" "T"
. "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "G" "A" "G" "G" "G" "G" "G"
. "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "C" "T" "T" "T" "T" "T" "T" "T"
. "A" "A" "A" "A" "A" "A" "A" "A" "A" "C" "A" "A" "A" "A" "A" "A" "A" "A" "A"
. "G" "G" "G" "." "G" "G" "G" "G" "G" "G" "G" "G" "A" "G" "G" "G" "G" "G" "G"
. "." "T" "C" "T" "T" "C" "T" "." "." "." "T" "T" "T" "T" "C" "T" "T" "T" "T"
. "." "A" "." "A" "." "C" "A" "A" "C" "." "A" "A" "A" "A" "A" "C" "A" "A" "A"
. "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "T" "C"
attr(,"class")
[1] "haplotype.loci"
attr(,"freq")
[1] 18 1142 2 5 25 6 1 4 2 1 2 5 1 9 1 3 1 4 1
它显然分配了19个单倍型。数据的呈现方式必定存在问题。有什么建议?此外,Pegas上的材料非常少,以及如何使用Pegas操作VCF文件。有没有人知道一个很好的资源(网页或书籍)来获取有关如何使用VCF文件中的单倍型进行操作的信息,它甚至不必为Pegas,任何R库都可以,或者Python ...任何事情真。
谢谢你的帮助,彼得
答案 0 :(得分:3)
我知道这是一篇旧帖子,但是如果其他人出现同样的问题,我已经找到了解决问题的方法。使用pacakage“vcfR”您可以使用read.vcfR()读取vcf,然后使用vcfR2DNAbin()将其转换为DNAbin。在DNAbin上使用单倍型()导致一类“单倍型”而非“单倍型。自我”。
答案 1 :(得分:2)
这是预期的结果:目前,haploNet()仅适用于由DNA seqs(“DNAbin”类)产生的“单倍型”类。 read.vcf()的输出是“loci”类,haplotype()是两个类的泛型函数。
如果您只处理SNP,可以通过以下方式避免这种情况:
class(h) <- NULL
h <- as.DNAbin(h)
(终极)目标是让haploNet()也与“haplotype.loci”(仍在开发中)和其他人一起工作。
干杯,艾曼纽尔