我在UCSC提供的床文件中有SNP ID和坐标。我想把它们映射到它们的基因名称。
chr1 9160974 9160975 rs1013578619 0 +
chr1 164528869 164528870 rs1016074293 0 +
chr1 192216772 192216773 rs1018731047 0 +
chr1 117157669 117157670 rs1022293363 0 +
chr1 33148118 33148119 rs1022386792 0 +
我已经提到许多建议使用bedtools交叉,UCSC表浏览器等的帖子,但我无法获得成功的结果。请建议用于此特定数据的选项。
答案 0 :(得分:1)
我们可以使用biomaRt package:
# data
mySNPs <- read.table(text = "chr1 9160974 9160975 rs1013578619 0 +
chr1 164528869 164528870 rs1016074293 0 +
chr1 192216772 192216773 rs1018731047 0 +
chr1 117157669 117157670 rs1022293363 0 +
chr1 33148118 33148119 rs1022386792 0 +")
colnames(mySNPs) <- c("chr", "start", "end", "name", "x", "strand")
library(biomaRt)
snpmart = useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
# Check which filters and attributes we wan't to use:
# listAttributes(snpmart)
# listFilters(snpmart)
# result
getBM(attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end", "ensembl_gene_stable_id"),
filters = c("snp_filter"),
values = mySNPs$name,
mart = snpmart)
# refsnp_id chr_name chrom_start chrom_end ensembl_gene_stable_id
# 1 rs1013578619 1 9160975 9160975 ENSG00000228526
# 2 rs1016074293 1 164528870 164528870
# 3 rs1018731047 1 192216773 192216773 ENSG00000285280
# 4 rs1022293363 1 117157670 117157670 ENSG00000134258
# 5 rs1022386792 1 33148119 33148119 ENSG00000278997
# 6 rs1022386792 1 33148119 33148119 ENSG00000116525