假设我们有一个bio_sequence,如:
VYDDGYHNGN
我们将随机编号为'。'沿着序列的随机位置,像这样:
..VY.DD...GY..HN.GN..
在MATLAB中是否有任何功能或最佳解决方案?
答案 0 :(得分:3)
cumsum
的方法 -
seq = 'VYDDGYHNGN'; %// Input sequence
N = numel(seq); %// number of elements in input sequence
grplen = ceil(0.2*N); %// group length
idx = cumsum(randi(grplen,1,N)) %// random indices for elements in output sequence
outseq = repmat('.',1, idx(end)+randi(grplen,1) ) %// placeholder for output
outseq(idx)=seq; %// Put elements from seq into outseq at random places indexed by idx
示例输出 -
outseq =
V.YDDG.Y.H.NG.N.
答案 1 :(得分:2)
根据您的意见,以下假定:
代码:
%// Data
seq = 'VYDDGYHNGN';
%// Let's go
m = numel(seq); %// sequence length
n = randi([0 round(.2*m)]); %// number of dots
p = m+n;
result = repmat('.', 1, p); %// initiallize result to all dots
result(sort(randsample(p,m))) = seq; %// place sequence in uniformly random positions
答案 2 :(得分:0)
这可行:
sequence = 'VYDDGYHNGN';
factor = .2;
n = numel(sequence);
len = randi(n,n+1,1)-(1-factor)*n;
sequence = [sequence repmat('.',1,len(n+1))];
for idx = n:-1:1
sequence= [sequence(1:idx-1) repmat('.',1,len(idx)) sequence(idx:end)];
end
sequence
避免循环可以按如下方式完成:
sequence = 'VYDDGYHNGN';
factor = .2;
n = numel(sequence);
len = randi(n,n+1,1)-(1-factor)*n;
seqdot = repmat('.',1,n+sum(max(0,len)));
seqdot((1:n) + cumsum(max(0,len(1:end-1)))') = sequence;