R中的简单密码子 - 氨基酸散列

时间:2017-03-23 20:13:19

标签: r hash bioinformatics bioconductor

我想创建一个R脚本,其中我有一个哈希表,我可以查找密码子并获得其相关的氨基酸。例如,

library(hash)

hashTable <- hash(...) #insert all codon-to-amino acid pairs
hashTable['TTT']

将返回

[1] Phe

有谁知道我会怎么做?或者也许我可以安装一个包(Bioconductor?),这会让这更容易?

3 个答案:

答案 0 :(得分:4)

对于这个问题,几乎可以肯定存在解决方案。一种可能性是Bioconductor的Biostrings例如

library(Biostrings)
GENETIC_CODE[["ATG"]]
[1] "M"

答案 1 :(得分:1)

因为为什么要使用哈希表?

acidLookup<-function(x){
  acids<-c("Isoleucine","Leucine","Valine","Phenylalanine","Methionine","Cysteine","Alanine","Glycine","Proline","Threonine","Serine",
         "Tyrosine","Tryptophan","Glutamine","Asparagine","Histidine","Glutamic acid","Aspartic acid","Lysine","Arginine","Stop codons")
  slc<-c("I","L","V","F","M","C","A","G","P","T","S","Y","W","Q","N","H","E","D","K","R","Stop")
  codon<-c("ATT, ATC, ATA","CTT, CTC, CTA, CTG, TTA, TTG","GTT, GTC, GTA, GTG","TTT, TTC","ATG","TGT, TGC",
         "GCT, GCC, GCA, GCG","GGT, GGC, GGA, GGG","CCT, CCC, CCA, CCG","ACT, ACC, ACA, ACG","TCT, TCC, TCA, TCG, AGT, AGC",
         "TAT, TAC","TGG","CAA, CAG","AAT, AAC","CAT, CAC","GAA, GAG","GAT, GAC","AAA, AAG","CGT, CGC, CGA, CGG, AGA, AGG","TAA, TAG, TGA")

  codon.list<-strsplit(codon,",")

  data.frame(acid=acids[grep(x,codon.list)],slc=slc[grep(x,codon.list)],codons=codon[grep(x,codon.list)])
}

acidLookup("ATA")

        acid slc        codons
1 Isoleucine   I ATT, ATC, ATA

答案 2 :(得分:0)

不需要使用特定的哈希表实现。如果Biostrings不够,则基础R的向量/列表中使用的标准名称符号应该有效:

aaCodes <- character(0);
aaCodes["ATG"] <- "Ile";
aaCodes["UGA"] <- "Trp";
aaCodes[c("CTC","AGG")] <- c("Leu","Ser");

> names(aaCodes)
[1] "ATG" "UGA" "CTC" "AGG"

> aaCodes[c("ATG","ATG","CTC","UGA")]
  ATG   ATG   CTC   UGA
"Ile" "Ile" "Leu" "Trp"

> substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)
[1] "ATG" "ATG" "CTC" "UGA"

> aaCodes[substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)]
  ATG   ATG   CTC   UGA 
"Ile" "Ile" "Leu" "Trp" 

这不会显示R用于每个字符串的内部哈希值,但看起来这个问题似乎并没有要求。