如果我有一个更长的字符串,我如何计算在该字符串中找到给定长度的单词的概率?
到目前为止,我有这个:
import math
from scipy import stats
alphabet = list("ATCG") # This is the alphabet I am working with
string = "AATCAGTAGATCG" # Here are two example strings
string2 = "TGTAAACCTTGGTTTATCG"
word = "ATCG" # This is my word
n_substrings = len(string) - len(word) # The number of possible substrings
n_substrings2 = len(string2) - len(word)
prob_match = math.pow(len(alphabet), - len(word)) # The probability of randomly choosing the word from the alphabet
# Get the probability from a binomial test?
print stats.binom_test(1, n_substrings, p=prob_match) # (Number of successes, number of trials, prob of success)
print stats.binom_test(1, n_substrings2, p=prob_match)
>>>0.0346119111615
0.0570183821615
这是一种合适的方法吗?或者我错过了什么?
答案 0 :(得分:1)
我认为你应该这样做:
n_substrings = len(string) - len(word) +1
在5个字母的字符串中,有4个字母的子字符串,您有2个选项: ATCGA可以举办ATCG和TCGA