如何在R中将基因符号转换为Ensembl ID和uniprot_swissprot?

时间:2016-10-19 10:02:06

标签: r bioconductor biomart

我有一个基因列表,其P值和倍数变化值作为矩阵。

                   Symbols Entrez_IDs      logFC AveExpr          t   P.Value adj.P.Val        B
7987405        RASGRP1      10125 -9.924e-01   6.937 -5.467e+00 7.496e-07   0.01147  5.41279
8095728           EREG       2069  7.046e-01   5.467  5.302e+00 1.420e-06   0.01147  4.85944
7908397          RGS13       6003  6.332e-01   4.092  5.033e+00 3.949e-06   0.01728  3.97307
8176306         CSF2RA       1438  4.693e-01   5.085  5.012e+00 4.277e-06   0.01728  3.90397
8115355          GLRA1       2741 -1.548e+00   6.759 -4.928e+00 5.861e-06   0.01894  3.63094
7963826        PPP1R1A       5502 -9.774e-01   9.411 -4.710e+00 1.315e-05   0.03136  2.93060
7996022          CCL22       6367  6.668e-01   5.927  4.701e+00 1.358e-05   0.03136  2.90275
8139087          SFRP4       6424  1.520e+00   4.797  4.453e+00 3.340e-05   0.05467  2.12401
7929344          FFAR4     338557 -8.247e-01   6.682 -4.409e+00 3.908e-05   0.05467  1.98812
8119338          GLP1R       2740 -8.666e-01   8.111 -4.399e+00 4.052e-05   0.05467  1.95698
8100977          CXCL5       6374  6.301e-01   7.856  4.337e+00 5.047e-05   0.05467  1.76699
8104901           IL7R       3575  9.732e-01   4.962  4.331e+00 5.158e-05   0.05467  1.74821
8104570        FAM105A      54491 -9.411e-01   8.692 -4.330e+00 5.164e-05   0.05467  1.74718
8126244          LRFN2      57497 -7.189e-01   6.223 -4.317e+00 5.409e-05   0.05467  1.70720
7983630           FGF7       2252  1.032e+00   5.146  4.303e+00 5.685e-05   0.05467  1.66416
7919326           ACP6      51205 -4.909e-01   7.686 -4.302e+00 5.714e-05   0.05467  1.65977
7975268           ARG2        384 -9.104e-01   7.787 -4.273e+00 6.315e-05   0.05467  1.57340
7972021         TBC1D4       9882 -4.516e-01   7.663 -4.257e+00 6.684e-05   0.05467  1.52441
7938951           ANO5     203859 -6.176e-01   7.468 -4.230e+00 7.358e-05   0.05467  1.44148
7948881          WDR74      54663  4.599e-01   8.874  4.223e+00 7.532e-05   0.05467  1.42124
8120362          BEND6     221336 -5.006e-01   5.247 -4.220e+00 7.594e-05   0.05467  1.41416
8071953          SGSM1     129049 -4.729e-01   6.618 -4.216e+00 7.716e-05   0.05467  1.40042
8081548        NECTIN3      25945 -5.347e-01   8.841 -4.200e+00 8.144e-05   0.05467  1.35383
8154135         SLC1A1       6505  7.325e-01   8.062  4.183e+00 8.656e-05   0.05467  1.30118

我想将它们转换为Ensembl基因ID和uniprot_swissprot。我尝试使用以下代码但每次都出错:

library(biomaRt)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
attributes=c('ensembl_gene_id','ensembl_transcript_id','hgnc_symbol', 'uniprot_swissprot')
genes <- rma_final$genes
rma_final<-rma_final[,-10]
G_list<- getBM(attributes=attributes, filters="hugene10stv1",values=genes, mart=mart, uniqueRows=T)

getBM出错(attributes = attributes,filters =“hugene10stv1”,values = genes,:   Values参数不包含任何数据。

我试图使用这个commond

G_list<- getBM(attributes=attributes, filters="hugene10stv1", 
               values=rma_final$Symbols , mart=mart, uniqueRows=T) 

但我也有错误

getBM出错(attributes = attributes,filters =“hugene10stv1”,values = rma_final $ Entrez_IDs,:无效的过滤器:hugene10stv1

任何帮助都将受到高度赞赏

1 个答案:

答案 0 :(得分:1)

使用"hgnc_symbol"作为基因符号的过滤器:

genes <- c("RASGRP1","EREG")
G_list<- getBM(attributes=attributes, filters="hgnc_symbol",values=genes,
    mart=mart, uniqueRows=T)

   ensembl_gene_id ensembl_transcript_id hgnc_symbol uniprot_swissprot
1  ENSG00000172575       ENST00000310803     RASGRP1            O95267
2  ENSG00000172575       ENST00000558432     RASGRP1                  
3  ENSG00000172575       ENST00000561180     RASGRP1  
...