SNP与基因名称配位

时间:2018-04-09 01:33:36

标签: r mapping bioinformatics genome

我在UCSC提供的床文件中有SNP ID和坐标。我想把它们映射到它们的基因名称。

chr1    9160974     9160975     rs1013578619    0   +
chr1    164528869   164528870   rs1016074293    0   +
chr1    192216772   192216773   rs1018731047    0   +
chr1    117157669   117157670   rs1022293363    0   +
chr1    33148118    33148119    rs1022386792    0   +

我已经提到许多建议使用bedtools交叉,UCSC表浏览器等的帖子,但我无法获得成功的结果。请建议用于此特定数据的选项。

1 个答案:

答案 0 :(得分:1)

我们可以使用biomaRt package

# data
mySNPs <- read.table(text = "chr1    9160974     9160975     rs1013578619    0   +
chr1    164528869   164528870   rs1016074293    0   +
chr1    192216772   192216773   rs1018731047    0   +
chr1    117157669   117157670   rs1022293363    0   +
chr1    33148118    33148119    rs1022386792    0   +")
colnames(mySNPs) <- c("chr", "start", "end", "name", "x", "strand")

library(biomaRt)

snpmart = useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")

# Check which filters and attributes we wan't to use:
# listAttributes(snpmart)
# listFilters(snpmart)

# result
getBM(attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end", "ensembl_gene_stable_id"), 
      filters = c("snp_filter"), 
      values = mySNPs$name, 
      mart = snpmart)

#      refsnp_id chr_name chrom_start chrom_end ensembl_gene_stable_id
# 1 rs1013578619        1     9160975   9160975        ENSG00000228526
# 2 rs1016074293        1   164528870 164528870                       
# 3 rs1018731047        1   192216773 192216773        ENSG00000285280
# 4 rs1022293363        1   117157670 117157670        ENSG00000134258
# 5 rs1022386792        1    33148119  33148119        ENSG00000278997
# 6 rs1022386792        1    33148119  33148119        ENSG00000116525