我想创建一个R脚本,其中我有一个哈希表,我可以查找密码子并获得其相关的氨基酸。例如,
library(hash)
hashTable <- hash(...) #insert all codon-to-amino acid pairs
hashTable['TTT']
将返回
[1] Phe
有谁知道我会怎么做?或者也许我可以安装一个包(Bioconductor?),这会让这更容易?
答案 0 :(得分:4)
对于这个问题,几乎可以肯定存在解决方案。一种可能性是Bioconductor的Biostrings,例如:
library(Biostrings)
GENETIC_CODE[["ATG"]]
[1] "M"
答案 1 :(得分:1)
因为为什么要使用哈希表?
acidLookup<-function(x){
acids<-c("Isoleucine","Leucine","Valine","Phenylalanine","Methionine","Cysteine","Alanine","Glycine","Proline","Threonine","Serine",
"Tyrosine","Tryptophan","Glutamine","Asparagine","Histidine","Glutamic acid","Aspartic acid","Lysine","Arginine","Stop codons")
slc<-c("I","L","V","F","M","C","A","G","P","T","S","Y","W","Q","N","H","E","D","K","R","Stop")
codon<-c("ATT, ATC, ATA","CTT, CTC, CTA, CTG, TTA, TTG","GTT, GTC, GTA, GTG","TTT, TTC","ATG","TGT, TGC",
"GCT, GCC, GCA, GCG","GGT, GGC, GGA, GGG","CCT, CCC, CCA, CCG","ACT, ACC, ACA, ACG","TCT, TCC, TCA, TCG, AGT, AGC",
"TAT, TAC","TGG","CAA, CAG","AAT, AAC","CAT, CAC","GAA, GAG","GAT, GAC","AAA, AAG","CGT, CGC, CGA, CGG, AGA, AGG","TAA, TAG, TGA")
codon.list<-strsplit(codon,",")
data.frame(acid=acids[grep(x,codon.list)],slc=slc[grep(x,codon.list)],codons=codon[grep(x,codon.list)])
}
acidLookup("ATA")
acid slc codons
1 Isoleucine I ATT, ATC, ATA
答案 2 :(得分:0)
不需要使用特定的哈希表实现。如果Biostrings不够,则基础R的向量/列表中使用的标准名称符号应该有效:
aaCodes <- character(0);
aaCodes["ATG"] <- "Ile";
aaCodes["UGA"] <- "Trp";
aaCodes[c("CTC","AGG")] <- c("Leu","Ser");
> names(aaCodes)
[1] "ATG" "UGA" "CTC" "AGG"
> aaCodes[c("ATG","ATG","CTC","UGA")]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
> substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)
[1] "ATG" "ATG" "CTC" "UGA"
> aaCodes[substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
这不会显示R用于每个字符串的内部哈希值,但看起来这个问题似乎并没有要求。