如果数据是
,我应该怎么做才能计算参数中字符出现的百分比t<-c(UUU,UUC,UUA,UUG,CUU,CUC,CUA,CUG,AUU,AUC,AUA,AUG,GUU,GUC,GUA,GUG,UCU,UCC,UCA,UCG,CCU,CCC,CCA,CCG,ACU,ACC,ACA,ACG,GCU,GCC,GCA,GCG,UAU,UAC,UAA,UAG,CAU,CAC,CAA,CAG,AAU,AAC,AAA,AAG,GAU,GAC,GAA,GAG,UGU,UGC,UGA,UGG,CGU,CGC,CGA,CGG,AGU,AGC,AGA,AGG,GGU,GGC,GGA,GGG)
我想对此做一个函数,这可能有助于我将来计算更多问题。
假设我们的论点是 -
(UUUUUCUUAUUGCUUCUCCUACUGAUUAUCAUAAUGGUUGUCGUAGUGUCUUCCUCAUCGCCUCCCCCACCGACUACCACAACGGCUGCCGCAGCGUAUUACUAAUAGCAUCACCAACAGAAUAACAAAAAGGAUGACGAAGAGUGUUGCUGAUGGCGUCGCCGACGGAGUAGCAGAAGAGGUGGCGGAGGG)
另外,阅读框会在起始位置开始,分别以3的数量分开(例如-UG,GUG) 我得到了这个代码,但是我希望我的答案以列表的形式包含两个名为count和percentage的列,请帮我修改这段代码以按要求的方式给出百分比。
seqn <- c("UUA","AUC","GUA", "UUA", "GAU", "UUA") #your sequence
l_seq <- length(seqn)
u_seq <- unique(seqn)
seq_long <- "UUUAUGGGCG"
seqn <- unlist(str_extract_all(seq_long, pattern = "[AUGC]{3}"))
colSums(sapply(u_seq, function(s) str_count(string = seqn,pattern = s)))/l_seq
帮我纠正这段代码我希望我的论点像UGCUGCUAUGAAUGAUG一样持续
答案 0 :(得分:0)
这可能对您有用:
require(stringr)
bases <- c("U","A","G","C")
sapply(bases, function(b) str_count(string = c("UUA","AUC","GUA"),pattern = b))
U A G C
[1,] 2 1 0 0
[2,] 1 1 0 1
[3,] 1 1 1 0
编辑:基础遗传学
EDIT2:根据您的评论,这可能有效
seqn <- c("UUA","AUC","GUA", "UUA", "GAU", "UUA") #your sequence
l_seq <- length(seqn) #length of sequence
u_seq <- unique(seqn) #unique codons
# This calculates the fractions of the unique codons in your sequence
colSums(sapply(u_seq, function(s) str_count(string = seqn,pattern = s)))/l_seq
UUA AUC GUA GAU
0.5000000 0.1666667 0.1666667 0.1666667
EDIT3:根据你的第二个问题,你可以将你的字符串分成3个字母的密码子,如下所示:
seq_long <- "UUUAUGGGCG"
seqn <- unlist(str_extract_all(seq_long, pattern = "[AUGC]{3}"))
并运行EDIT2中的代码。如果您的序列不是3的倍数,您将丢失最后的字母。你可以用一些填充来解决这个问题。