MATLAB基本马尔可夫链实现

时间:2013-10-16 15:20:48

标签: string matlab matrix random-sample markov-chains

我正在编写代码模拟一个非常简单的马尔可夫链,从两个转换矩阵中的任何一个生成10000个6个核苷酸的序列(即如果先前的核苷酸是A,则使用这组概率生成下一个核苷酸,依此类推)。我也通过从所述矩阵中获取相应的概率来获得运行的产品,但这些产品也没有正确传播。

我知道MATLAB可能不会像其他语言那样处理字符串/字符,因此我不完全确定我的代码会发生什么。基本上,我正在抽样,但它似乎没有正确分配。我的代码如下。这无疑是不优雅的,但它应该有用......但事实并非如此。考虑到那些转换矩阵,有没有更简单的方法来做到这一点? (显然,代码仅适用于第一个矩阵(M1)。)

clear all;
close all;

%length of sequences
n = 6;

%nucleotide map

N = 'ACGT';

%initial probabilities
%     A    C    G    T
M0 = [0.25 0.25 0.25 0.25];

%transition probabilities (if row, then col)
%inside CpG
%     A       C       G       T
M1 = [0.18    0.274   0.426   0.12    %A
      0.171   0.367   0.274   0.188   %C
      0.161   0.339   0.375   0.125   %G
      0.079   0.355   0.384   0.182]; %T

%outside CpG
%     A       C       G       T
M2 = [0.3     0.205   0.285   0.21    %A
      0.322   0.298   0.078   0.302   %C
      0.248   0.246   0.298   0.208   %G
      0.177   0.239   0.292   0.292]; %T

sampletype = input('Matrix Type (M1 or M2): ', 's');

sampseq = zeros(n,1);
PM1 = 1;
PM2 = 1;
count = 0;

if strcmp(sampletype,'M1')
    for i = 1:10000
        for position = 1:n
            if position == 1
                firstnuc = randsample(N,1,true,M0);
                sampseq(1) = firstnuc;

                PM1 = PM1*0.25;
                PM2 = PM2*0.25;

            %check for previous nucleotide
            elseif position ~= 1
                if strcmp(sampseq(position-1),'A')
                    tempnuc = randsample(N,1,true,M1(1,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(1,findstr(N,tempnuc));
                    PM2 = PM2*M2(1,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'C')
                    tempnuc = randsample(N,1,true,M1(2,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(2,findstr(N,tempnuc));
                    PM2 = PM2*M2(2,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'G')
                    tempnuc = randsample(N,1,true,M1(3,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(3,findstr(N,tempnuc));
                    PM2 = PM2*M2(3,findstr(N,tempnuc));
                elseif strcmp(sampseq(position-1),'T')
                    tempnuc = randsample(N,1,true,M1(4,:));
                    sampseq(position) = tempnuc;

                    PM1 = PM1*M1(4,findstr(N,tempnuc));
                    PM2 = PM2*M2(4,findstr(N,tempnuc));
                end
            end
        end

        if PM2 > PM1
            count = count + 1;
        end
        PM1
        PM2
        PM1 = 1;
        PM2 = 1;
        sampseq
        sampseq = zeros(n,1);
    end
end

count

1 个答案:

答案 0 :(得分:1)

啊,我修好了。为了解决'ACGT'字符串创建的位指定,我刚刚创建了占位符:

%    A C G T
N = [1 2 3 4]

之后索引修复很简单。

我仍然有兴趣弄清楚如何生成“真实”序列,正如我最初尝试的那样,而不仅仅是1s,2s,3s和4s组合的串。