查找包含所有K个元素的最小子列表

时间:2019-05-30 18:36:53

标签: prolog

假设我有一个列表:[1,3,1,3,1,3,3,2,2,1]N=3(不同的数字)作为输入。我想要的是在此列表中找到包含所有N的最小子列表的大小在此示例中,大小为4的是[1,3,3,2][3,2,2,1]; 到目前为止,我为K=[1,2,3]List=[1,3,1,3,1,3,3,2,2,1]做的事情:

append(Subl,_,List),
subset(K,Subl),!,
append(_,Subl2,Subl),
length(Subl2,L),
subset(K,Subl2).

带有第一个子集的第一个追加查找具有所有K个元素的第一个子列表,在这种情况下为[1,3,1,3,1,3,3,2]。之后,我们尝试减小此列表的大小尽可能使它不会停止包含所有K个不同的数字。在这种情况下,最终结果将是[1,3,3,2]且L = 4。我的问题是:找到[1,3, 1,3,1,3,3,2,2] L从8开始到4。每次找到包含K个数字的较小子列表时,如何更新(也许以某种方式存储?)此值? [1,3,3,2](我可以使用+ subset(K,Subl2),结果将是[3,3,2])。我应该如何将List的其余元素添加到[3,3,2 ](这些元素将为[2,2,1])并开始整个过程​​?

PS:我找到了以前S / O帖子的其他解决方案,但他们找到了从大小3到大小10的所有可能子列表,并检查每个子列表是否包含所有K元素。我认为我们必须在这里使用滑动窗口方法吗?

2 个答案:

答案 0 :(得分:0)

我认为以下代码可能对您有用。我还没有证明它的正确性,但是它似乎可以正常工作(即使对于非常大的输入)。无论如何,您可以将其用作正确实施的起点。

shortest_length(List, K, ShortestLength) :-
    shortest_sublist(List, K, Sublist),
    length(Sublist, ShortestLength).

shortest_sublist(List, K, Sublist) :-
    split(List, K, Prefix, Suffix),              % find minimum prefix with all K items
    shrink(Prefix, [First|Rest]),                % shrink that prefix to get a sublist
    append(Rest, Suffix, NewList),
    (   shortest_sublist(NewList, K, NewSublist) % find a new sublist in the rest of the list
    ->  (   length([First|Rest], Len1),
            length(NewSublist, Len2),
            (   Len1 < Len2
            ->  Sublist = [First|Rest])          % new sublist is not shorter than the previous
            ;   Sublist = NewSublist )           % new sublist is shorter than the previous
    ;   Sublist = [First|Rest]  ).               % a new sublist was not found


split(List, K, Prefix, Suffix) :-
    append(Prefix, Suffix, List),
    has(Prefix, K), !.

has(List, K) :-
    forall( between(1, K, Item),
            memberchk(Item, List) ).

shrink([First|Rest], ShrunkList) :-
    (   memberchk(First, Rest)
    ->  shrink(Rest, ShrunkList)
    ;   ShrunkList = [First|Rest] ).

一些小投入的结果:

?- shortest_length([1,3,1,3,1,3,3,2,2,1], 3, N).
N = 4.

?- shortest_sublist([1,3,1,3,1,3,3,2,2,1], 3, S).
S = [3, 2, 2, 1].

?- shortest_sublist([1,3,1,3,1,3,3,2,2,1,3,3], 3, S).
S = [2, 1, 3].

一些较大输入的结果:

?- length(L, 500000), maplist(random(1,5),L), time(shortest_sublist(L, 4, S)).
% 11,153,796 inferences, 1.766 CPU in -712561273706905600.000 seconds (?% CPU, 6317194 Lips)
L = [2, 1, 3, 4, 2, 2, 4, 4, 3|...],
S = [4, 1, 2, 3].

?- length(L, 1000000), maplist(random(1,5),L), time(shortest_sublist(L, 4, S)).
% 22,349,463 inferences, 3.672 CPU in -657663431226163200.000 seconds (?% CPU, 6086662 Lips)
L = [2, 2, 4, 3, 2, 2, 3, 1, 2|...],
S = [2, 1, 4, 3].

?- length(L, 2000000), maplist(random(1,5),L), time(shortest_sublist(L, 4, S)).
% 44,655,878 inferences, 6.844 CPU in 919641833393356800.000 seconds (0% CPU, 6525060 Lips)
L = [4, 1, 3, 3, 4, 3, 3, 3, 2|...],
S = [2, 1, 3, 4].

对于较小的K值,该算法似乎消耗与O(n)成比例的时间。注意,当列表的长度加倍时,执行时间也加倍(即500000→〜1.8秒,1000000→〜3.7秒,2000000→〜6.9秒)。

我认为瓶颈在谓词has/1中。因此,对于更高效的实现(对于更大的K值),您需要一种更高效的策略来检查列表成员身份。

答案 1 :(得分:0)

我尝试了另一种方法来处理大型列表。

我使用搜索号码列表和列表中最后出现的索引。

SvgSurface svgSurfaceTest1 = new SvgSurface (path, 500, 500);

我得到的结果与Simvio Lago差不多

getValue(In, Ind, V) :-
    nth0(Ind, In, V).


% create the list of the numbers with the index of there last appearance
% -1 if not
% the list is sorted in decreasing order of the index of the numbers 
make_indice(U, -1 - U).

minSubList(Min, Max, In, Out) :-
    numlist(Min, Max, NL),
    maplist(make_indice, NL, Il),

    % at the beginning, the length of the sublist is the length of the input !
    length(In, Len),

    % main predicate of the process
    walk(In, 0, Len, Il, 0, Len, VMin, VMax),

    % now we get the result
    numlist(VMin, VMax, NL1),
    maplist(getValue(In), NL1, Out).

% if the list is empty process is finished
walk([], _, _, _IL, Min, Max, Min, Max).

% @arg1 current input to process
% @arg2 index of the head of the input in the initial input
% @arg3 current len of sublist containing all of the numbers
% @arg4 current list of the numbers with there index in the initial list
% @arg5 current first index where we find all the numbers 
% @arg6 current last index where we find all the numbers 
% @arg7 final first index where we find all the numbers 
% @arg8 final last index where we find all the numbers 

walk([H|T], N, Len, Il,  CurMin, CurMax, Min, Max) :-
    % we remove the element of the index list concerning H
    select(_-H, Il, IlTemp),
    % we build the new list of index H is the first lement of the list
    % because he is the last seen !
    LstInd = [N-H | IlTemp],
    % we need to know the index of the first number seen, 
    % it is the last of the list
    last(LstInd, V - _),
    N1 is N+1,
    (   V = -1
    ->  % at least one number is not seen 
        % we keep on this way
        walk(T, N1, Len, LstInd, CurMin, CurMax, Min, Max)
    ;   % all the numbers are seen
        % we must update the lentgh of the sublist
        Len1 is N-V+1,
        (   Len1 < Len
        ->  NewLen = Len1,
            NewMin = V,
            NewMax = N
        ;   NewLen = Len,
            NewMin = CurMin,
            NewMax = CurMax),
        walk(T, N1, NewLen, LstInd, NewMin, NewMax, Min, Max)).

我没有使用nth0来获取结果列表,而是尝试使用此代码 append / 2

?- length(L, 500000), maplist(random(1,5),L), time(minSubList(1,4,L, S)).
% 8,750,508 inferences, 0.922 CPU in 0.922 seconds (100% CPU, 9486338 Lips)
L = [2, 2, 2, 2, 1, 1, 4, 2, 4|...],
S = [3, 2, 1, 4] .

?- length(L, 1000000), maplist(random(1,5),L), time(minSubList(1,4,L, S)).
% 17,502,632 inferences, 3.017 CPU in 3.017 seconds (100% CPU, 5800726 Lips)
L = [4, 3, 3, 4, 1, 1, 4, 4, 3|...],
S = [4, 3, 2, 1] .

?- length(L, 2000000), maplist(random(1,5),L), time(minSubList(1,4,L, S)).
% 34,999,875 inferences, 6.836 CPU in 6.836 seconds (100% CPU, 5119639 Lips)
L = [2, 1, 2, 1, 3, 3, 3, 1, 3|...],
S = [4, 3, 2, 1] .

持续时间更长。 我得到这些结果:

minSubList(Min, Max, In, Out) :-
    numlist(Min, Max, NL),
    maplist(make_indice, NL, Il),

    length(In, Len),
    walk(In, 0, Len, Il, 0, Len, VMin, VMax),

    Len2 is VMax - VMin +1,
    length(W, VMin),
    length(Out, Len2),
    append([W, Out, _], In).