我正在编写代码模拟一个非常简单的马尔可夫链,从两个转换矩阵中的任何一个生成10000个6个核苷酸的序列(即如果先前的核苷酸是A,则使用这组概率生成下一个核苷酸,依此类推)。我也通过从所述矩阵中获取相应的概率来获得运行的产品,但这些产品也没有正确传播。
我知道MATLAB可能不会像其他语言那样处理字符串/字符,因此我不完全确定我的代码会发生什么。基本上,我正在抽样,但它似乎没有正确分配。我的代码如下。这无疑是不优雅的,但它应该有用......但事实并非如此。考虑到那些转换矩阵,有没有更简单的方法来做到这一点? (显然,代码仅适用于第一个矩阵(M1)。)
clear all;
close all;
%length of sequences
n = 6;
%nucleotide map
N = 'ACGT';
%initial probabilities
% A C G T
M0 = [0.25 0.25 0.25 0.25];
%transition probabilities (if row, then col)
%inside CpG
% A C G T
M1 = [0.18 0.274 0.426 0.12 %A
0.171 0.367 0.274 0.188 %C
0.161 0.339 0.375 0.125 %G
0.079 0.355 0.384 0.182]; %T
%outside CpG
% A C G T
M2 = [0.3 0.205 0.285 0.21 %A
0.322 0.298 0.078 0.302 %C
0.248 0.246 0.298 0.208 %G
0.177 0.239 0.292 0.292]; %T
sampletype = input('Matrix Type (M1 or M2): ', 's');
sampseq = zeros(n,1);
PM1 = 1;
PM2 = 1;
count = 0;
if strcmp(sampletype,'M1')
for i = 1:10000
for position = 1:n
if position == 1
firstnuc = randsample(N,1,true,M0);
sampseq(1) = firstnuc;
PM1 = PM1*0.25;
PM2 = PM2*0.25;
%check for previous nucleotide
elseif position ~= 1
if strcmp(sampseq(position-1),'A')
tempnuc = randsample(N,1,true,M1(1,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(1,findstr(N,tempnuc));
PM2 = PM2*M2(1,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'C')
tempnuc = randsample(N,1,true,M1(2,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(2,findstr(N,tempnuc));
PM2 = PM2*M2(2,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'G')
tempnuc = randsample(N,1,true,M1(3,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(3,findstr(N,tempnuc));
PM2 = PM2*M2(3,findstr(N,tempnuc));
elseif strcmp(sampseq(position-1),'T')
tempnuc = randsample(N,1,true,M1(4,:));
sampseq(position) = tempnuc;
PM1 = PM1*M1(4,findstr(N,tempnuc));
PM2 = PM2*M2(4,findstr(N,tempnuc));
end
end
end
if PM2 > PM1
count = count + 1;
end
PM1
PM2
PM1 = 1;
PM2 = 1;
sampseq
sampseq = zeros(n,1);
end
end
count
答案 0 :(得分:1)
% A C G T
N = [1 2 3 4]
之后索引修复很简单。
我仍然有兴趣弄清楚如何生成“真实”序列,正如我最初尝试的那样,而不仅仅是1s,2s,3s和4s组合的串。