Question

标题有点模糊，但我不确定如何区别对待。我所拥有的是一个非常长的数组，比如长度10000，包含值1,2和3.它们通常位于相同数字的长字符串中，例如

[1111111111122222222211111222222222233333332222]

数据表示某些东西的3种状态，即1,2和3.可能的唯一转变是1 - 和 - 1。 2,2 - ＆lt; 2＆gt; 3，不是1 - ＆lt; - ＆gt; 3。

一般来说，字符串非常长，因此不太可能观察到类似[111121111]的内容，在单个元素中然后返回为2。然而，由于测量中的错误，这些事情确实进来了，我正试图找到一种方法在MATLAB中过滤掉它们。所以我想要做的是删除所有元素，连续相同元素的数量小于某个数字X.如果对于一般X很难做，X = 1是一个非常好的开始！

就个人而言，我不知道如何解决这个问题。我想使用diff可以告诉你元素在哪里改变，当它们再次改变时，然后通过表示它们的索引以某种方式你可以找到序列的长度。然后，使用一些if条件，您可以删除它们。这应该可以向后完成，因为数组的大小会改变。我仍然在尝试使用这些东西，但到目前为止还没有成功。也许有人可以给我一个提示？

Answer 1

方法1 （使用bsxfun。效率低。我推荐第二种方法。¹）

以下代码检测短期运行的开始。然后你的问题不明确（删除那些条目？用前面的值填充它们吗？）。

x = '1111111111122222222211111222222222233333332222'; %// data (string)
len = 5; %// runs of this length or shorter will be detected

ind = find(diff(x-'0')~=0) + 1; %// index of changes
mat = bsxfun(@minus, ind.', ind); %'// distance between changes
mat = tril(mat); %// only distance to *previous* changes, not to *later* changes
mat(mat==0) = NaN;
result = ind(any(mat<=len)); %// index of beginning of short runs

在此示例中，结果为

result =
    21

请注意，不考虑上次运行。因此，在该示例中，即使最后一次运行短于len，也不会检测到它太短。如果您还需要检测该运行，请将ind行更改为

ind = find([diff(x-'0') inf]~=0) + 1;

在这种情况下，

result =
    21    43

方法2 （使用diff。比方法1更有效。）

将每个指数与前面的指数进行比较，而不是与上述所有其他指数进行比较。另外，根据评论，短期运行需要用前面的值替换;如果它很短，也应该检测到最后一次运行：

%// Data
x = '1111111111122222222211111222222222233333332222'; %// data (string)
len = 5; %// runs of this length or shorter will be detected

%// Detect beginning of short runs
ind = find([diff(x-'0') inf]~=0) + 1;
starts = ind(diff(ind)<=len); %// index of beginning of short runs

%// Replace short runs with preceding value
ind = [ind numel(x)+1]; %// extend ind in case last run was detected as short
for k = find(diff(ind)<=len)
    x(ind(k):ind(k+1)-1) = x(ind(k)-1); %// replace
end

¹ _{为什么我会继续接近1？好吧，它在我接近2之前得到了四个upvotes，所以必须有一些东西（我怀疑这与bsxfun有关...）}

Answer 2

这可能是一种方法 -

%%// Input string
a1 = '111111111112222222221111122222222221111133333332222'

th = 10 %%// Less than or equal to 10 consecutive oocurances shall be removed

str1 = num2str(a1=='1','%1d')

t1 = strfind(['0' str1 '0'],'01')' %%//'
t2 = strfind(['0' str1 '0'],'10')' %%//'
t3 = [t1 t2-1]
t4 = t3([t2-t1]<=th,:)

ind1 = true(size(a1))
for k=1:size(t4,1)
  ind1(t4(k,1):t4(k,2))=false;
end
out = a1(ind1) %%// Output string

输出 -

out =
11111111111222222222222222222233333332222

从序列中过滤短长度的变化（MATLAB）

2 个答案: