最短字符串,包含字符列表中固定长度的所有子字符串(子集的任意1个排列)

时间:2016-10-22 17:34:30

标签: string algorithm graph-algorithm

如果给出一个字符列表{s1,s2,s3,...,s10},我想找到一个最短长度的字符串,其长度为3的所有无序子集组合都作为字符串中的子字符串出现。例如,如果我考虑子集{s2,s4,s9},那么我将能够找到包含任意顺序的这三个字符的字符串的至少一个实例作为子字符串。没有重复,因为不需要包含“s1s1s1'”形式的子字符串。

1 个答案:

答案 0 :(得分:0)

我使用MiniZinc约束求解器解决了这个问题:

%  dimensions
int: N = 10;  %  number of characters
set of int: Characters = 1..N;
int: L = 416;  %  length of shortest string

%  decision variables
array[0..L-1] of var Characters: shortest;

%  every unordered subset must occur somewhere in shortest
constraint forall(a, b, c in 1..N where (a < b) /\ (b < c)) (
    exists(i in 0..L-3) (
        ((shortest[i] == a) \/(shortest[i+1] == a) \/ (shortest[i+2] == a)) /\
        ((shortest[i] == b) \/(shortest[i+1] == b) \/ (shortest[i+2] == b)) /\
        ((shortest[i] == c) \/(shortest[i+1] == c) \/ (shortest[i+2] == c))
    )
  );

%  to speed things up, we enforce the first N entries
constraint forall(i in 0..N-1) (
  shortest[i] == i+1
);

%  further speedup: adjacent entries are probably different
constraint forall(i in N..L-2) (
  shortest[i] != shortest[i+1]
);

solve satisfy;

%
%  Output solution as table of variable value assignments
%%
output 
[ show(shortest[i]) ++ " " | i in 0..L-1 ];

对于包含5个字符的字符集,可立即找到解决方案:

1 2 3 4 5 1 2 4 1 3 5 2 4 

但是对于更多的角色,更不用说10,搜索需要太长时间才能实用。

我注意到每个额外角色的最小长度似乎大约加倍。 对于3个字符,长度很简单3.对于4个字符,它是6个字符和5个字符13.但我找不到6个或更多字符的解决方案。

我找到了一篇相关论文On strings containing all subsets as substrings,证实了我对5个字符的发现。但该论文发表于1978年。最近的发现可能存在。