Question

我正在尝试编写一个对数组进行混洗的函数，该数组包含重复的元素，但要确保重复的元素彼此之间的距离不太近。

此代码有效，但对我来说似乎效率很低：

function shuffledArr = distShuffle(myArr, myDist)
% this function takes an array myArr and shuffles it, while ensuring that repeating 
% elements are at least myDist elements away from on another    

% flag to indicate whether there are repetitions within myDist
reps = 1;
while reps 

    % set to 0 to break while-loop, will be set to 1 if it doesn't meet condition
    reps = 0;  

    % randomly shuffle array
    shuffledArr = Shuffle(myArr);

    % loop through each unique value, find its position, and calculate the distance to the next occurence
    for x = 1:length(unique(myArr))
        % check if there are any repetitions that are separated by myDist or less
       if any(diff(find(shuffledArr == x)) <= myDist)
           reps = 1;
       break;
   end
end
end

在我看来，这不是最佳选择，原因有以下三个：

1）在找到解决方案之前，可能不必反复洗牌。

2）如果没有可能的解决方案（即，将myDist设置得太高而无法找到合适的配置），则while循环将永远持续下去。关于如何提前了解这一点的任何想法？

3）确定数组中重复元素之间的距离的方法必须比通过遍历每个唯一值所做的操作更简单。

即使第1点是正确的，我也很感激第2点和第3点的答案，并且有可能在一次混洗中做到这一点。

Answer 1

我认为检查以下条件足以防止无限循环：

[~,num, C] = mode(myArr);
N = numel(C);
assert( (myDist<=N)  || (myDist-N+1) * (num-1) +N*num <= numel(myArr),...
'Shuffling impossible!');

假设myDist是2，我们有以下数据：

[4 6 5 1 6 7 4 6]

我们可以找到模式6及其发生的情况3。我们安排6，用2 = myDist空格隔开：

6 _ _ 6 _ _6

必须有(3-1) * myDist = 4个数字才能填补空白。现在我们还有五个数字，因此可以对数组进行混洗。

如果我们有多种模式，问题将变得更加复杂。例如，对于此数组[4 6 5 1 6 7 4 6 4]，我们有N=2模式：6和4。它们可以安排为：

6 4 _ 6 4 _ 6 4

我们有2个空格和另外三个数字[ 5 1 7]可用于填充空格。例如，如果我们只有一个数字[ 5]，那么就不可能填补空白，也就无法对数组进行混洗。

对于第三点，您可以使用稀疏矩阵来加快计算速度（我在Octave中的初步测试表明它更有效）：

function shuffledArr = distShuffleSparse(myArr, myDist)
    [U,~,idx] = unique(myArr);
    reps = true;
    while reps 
        S = Shuffle(idx);
        shuffledBin = sparse ( 1:numel(idx), S, true, numel(idx) + myDist, numel(U) );
        reps = any (diff(find(shuffledBin)) <= myDist);
    end
    shuffledArr = U(S);
end

或者，您可以使用sub2ind和sort代替稀疏矩阵：

function shuffledArr = distShuffleSparse(myArr, myDist)
    [U,~,idx] = unique(myArr);
    reps = true;
    while reps 
        S = Shuffle(idx);
        f = sub2ind ( [numel(idx) + myDist, numel(U)] , 1:numel(idx), S );
        reps = any (diff(sort(f)) <= myDist);
    end
    shuffledArr = U(S);
end

Answer 2

如果您只想找到一种可能的解决方案，则可以使用类似的方法：

x = [1   1   1   2   2   2   3   3   3   3   3   4   5   5   6   7   8   9];
n = numel(x);
dist = 3;           %minimal distance
uni = unique(x);    %get the unique value
his = histc(x,uni); %count the occurence of each element
s = [sortrows([uni;his].',2,'descend'), zeros(length(uni),1)];

xr = [];            %the vector that will contains the solution

%the for loop that will maximize the distance of each element
for ii = 1:n
    s(s(:,3)<0,3) = s(s(:,3)<0,3)+1;
    s(1,3) = s(1,3)-dist;
    s(1,2) = s(1,2)-1; 
    xr = [xr s(1,1)];
    s = sortrows(s,[3,2],{'descend','descend'})
end

if any(s(:,2)~=0)
    fprintf('failed, dist is too big')
end

结果：

xr = [3   1   2   5   3   1   2   4   3   6   7   8   3   9   5   1   2   3]

说明：

我创建了一个向量s，并且在开始时s等于：

s =

   3   5   0
   1   3   0
   2   3   0
   5   2   0
   4   1   0
   6   1   0
   7   1   0
   8   1   0
   9   1   0

%col1 = unique element; col2 = occurence of each element, col3 = penalities

在for循环的每次迭代中，我们选择出现次数最多的元素，因为该元素将更难放置在数组中。

然后在第一次迭代s之后等于：

s =

   1   3   0  %1 is the next element that will be placed in our array.
   2   3   0
   5   2   0
   4   1   0
   6   1   0
   7   1   0
   8   1   0
   9   1   0
   3   4  -3  %3 has now 5-1 = 4 occurence and a penalities of -3 so it won't show up the next 3 iterations.

如果不是最小距离太大，最后第二列的每个数字应等于0。

排列数组，同时间隔重复元素

2 个答案: