我有一个DNA序列作为我的论点。
sequence<-c("ATGAATTTTGATTTA")
我想找到ATG重复和其他64个密码子的次数, 编码特定氨基酸的64个密码子是
codon <- list(ATA = "I", ATC = "I", ATT = "I", ATG = "M", ACA = "T",
ACC = "T", ACG = "T", ACT = "T", AAC = "N", AAT = "N", AAA = "K",
AAG = "K", AGC = "S", AGT = "S", AGA = "R", AGG = "R", CTA = "L",
CTC = "L", CTG = "L", CTT = "L", CCA = "P", CCC = "P", CCG = "P",
CCT = "P", CAC = "H", CAT = "H", CAA = "Q", CAG = "Q", CGA = "R",
CGC = "R", CGG = "R", CGT = "R", GTA = "V", GTC = "V", GTG = "V",
GTT = "V", GCA = "A", GCC = "A", GCG = "A", GCT = "A", GAC = "D",
GAT = "D", GAA = "E", GAG = "E", GGA = "G", GGC = "G", GGG = "G",
GGT = "G", TCA = "S", TCC = "S", TCG = "S", TCT = "S", TTC = "F",
TTT = "F", TTA = "L", TTG = "L", TAC = "Y", TAT = "Y", TAA = "stop",
TAG = "stop", TGC = "C", TGT = "C", TGA = "stop", TGG = "W")
然后,我想计算形成特定氨基酸的密码子的百分比,并希望以下列方式得出。
codon count amino_acids percentage
CTC 19666 L 0.18
CTT 27340 L 0.13
CTA 31534 L 0.20
CTG 76644 L 0.49
请帮我解决这个问题。
答案 0 :(得分:1)
只要您的密码子没有任何移位或间隙,
sequence<-c("ATGAATTTTGATTTAATG")
#split into 3-character codons
splitseq<-substring(sequence, seq(1, nchar(sequence)-1, 3), seq(3, nchar(sequence), 3))
[1] "ATG" "AAT" "TTT" "GAT" "TTA" "ATG"
#table them to get the frequency
x<-as.data.frame(table(splitseq))
#match up codon translation
x$codon<-codon[match(x$splitseq, names(codon))]
#get percentage
x$percentage<-x$Freq / sum(x$Freq)
x
splitseq Freq codon percentage
1 AAT 1 N 0.1666667
2 ATG 2 M 0.3333333
3 GAT 1 D 0.1666667
4 TTA 1 L 0.1666667
5 TTT 1 F 0.1666667