Question

我正在做一些信号分析，其中一部分是找到最长的子序列

我有以下字典：

sequenceDict = {
    0: [168, 360, 470],
    1: [279, 361, 471, 633, 729, 817],
    2: [32, 168, 170, 350, 634, 730, 818],
    3: [33, 155, 171, 363, 635, 731, 765, 819],
    4: [352, 364, 732, 766, 822],
    5: [157, 173, 353, 577, 637, 733, 823, 969],
    6: [158, 174, 578, 638, 706, 734, 824],
    7: [159, 175, 579, 707, 735],
    8: [160, 464, 640, 708, 826],
    9: [173, 709, 757, 827],
    10: [174, 540, 642, 666, 710],
    11: [253, 667, 711],
    12: [254, 304, 668],
    13: [181, 255, 831],
    14: [256, 340, 646, 832],
    16: [184, 416], 
    17: [417], 
    18: [418], 
    19: [875], 
    20: [876], 
    23: [217], 
    24: [168, 218, 880], 
    25: [219, 765, 881], 
    26: [220, 766], 
    27: [221], 
    28: [768], 
    29: [3, 769], 
    30: [344, 476, 706]}

这些基本上总是排序另一个数组的索引，我想找到最长的递增序列（就像longest increasing subsequence），只需从每个键中选择一个数字（键2紧跟在键1之后）等等），例如，从键0和1，[360,361]是一个序列，[470,471]是另一个序列。我称之为递增序列，因为这些数字应该严格增加1。

我已经查看过像patience sorting这样的东西，但由于这个问题略有不同，并且还有一个序列树，是否有任何已知的python实现，或其他有效的方法来执行此操作，除了生成这个字典中所有可能的序列然后运行耐心排序？

Answer 1

我只是实施“蛮力”解决方案......

保留“当前序列”列表，最初为空
对于每个密钥检查是否可以将任何当前序列扩展一步。增加序列更新也是最好的解决方案。
对于任何未用于扩展序列的数字，都会启动一个长度为1的新序列

Python提供的<div class="fb-page" data-adapt-container-width="false"  data-href="https://www.facebook.com/yourpage" data-show-posts="true" data-small-header="true" data-width="100%" > <div class="fb-xfbml-parse-ignore"> <blockquote cite="https://www.facebook.com/yourpage"><a href="https://www.facebook.com/yourpage">Like YourPage on Facebook!</a></blockquote> </div> </div>可能是一个合理的选择......这是一个示例实现：

    // FB Resize
    $(window).bind('resize', _.debounce(function() {

        if (window.FB && FB.XFBML && FB.XFBML.parse) {
            var el = $('.fb-page');
            var width = el.parent().width();
            el.attr('data-width', width);

            FB.XFBML.parse();
        }
    }, 1000)); // Debounce until 1000ms have passed

一个棘手的部分是，如果键中存在间隙，则无法扩展序列，这就是set的用途。

复杂性应为best = None current_sequences = set() last_key = None for key in sorted(sequenceDict.keys()): data = set(sequenceDict[key]) new_sequences = set() if last_key == key-1: # no gap in key value, may be some sequence got extended for val, count in current_sequences: if val+1 in data: # found a continuation, keep this sequence new_sequences.add((val+1, count+1)) data.remove(val+1) if best is None or count+1 > best[0]: # we've got a new champion best = count+1, val+1, key # add new sequences starting here for v in data: new_sequences.add((v, 1)) if best is None: best = 1, v, key current_sequences = new_sequences last_key = key。我只是一种直觉，但我的猜测是你不能低于此。我被使用last_key将常量值与每个序列相关联的想法诱惑...但是这不会检测到“间隙”（即键1中的值100和键3中的值102，但是在密钥2中没有 101。

输入问题后，解决方案为O(input_size × average_number_of_sequences)，意味着7个元素序列在键7处以值735结尾。

Answer 2

与@ 6502的解决方案相比，这个解决方案不仅能够保持最佳解决方案，而且能够跟踪每个递增的子序列，如果这样做更有帮助的话。

这个想法类似于滑动窗口方法。您从第一个列表开始，更新currentHotItems和globalHotItems词典，然后查看第二个列表并再次更新词典等。

# fill missing indexes in the dictionary:
for i in range(min(sequenceDict), max(sequenceDict)):
    if i not in sequenceDict:
        sequenceDict[i] = []

# get only lists, ordered:
sortedItems = map(lambda x:x[1], sorted(sequenceDict.items(), key=lambda x:x[0]))    
globalHotItems = {} # (value, startIndex): length
currentHotItems = {} # value: length

for i in range(len(sortedItems)):
    updatedHotItems = {} # updated value: length
    for item in sortedItems[i]:
        if (item - 1) in currentHotItems:
            updatedHotItems[item] = currentHotItems[item-1] + 1
        else:
            updatedHotItems[item] = 1

    deadSet = set(currentHotItems.keys()) - \
            set(updatedHotItems.keys() + [key - 1 for key in updatedHotItems.keys()])

    for item in deadSet:
        globalHotItems[ (item-currentHotItems[item]+1, i-currentHotItems[item]) ] = currentHotItems[item]

    currentHotItems = updatedHotItems

print sorted(globalHotItems.items(), key=lambda x:x[1])[-1]

globalHotItems是包含结果的字典。键是（value，startIndex），Value是长度。

例如，globalHotItems中的最后4项：

print sorted(globalHotItems.items(), key=lambda x:x[1])[-4:]

是：

[((157, 5), 4), ((217, 23), 5), ((706, 6), 6), ((729, 1), 7)]

这意味着最佳解决方案是长度为7，并在index=1列表中以729开头。最好的第二个解决方案是长度为6，从index=6列表开始为706等。

<强>复杂度：

我认为复杂性应该是：O(input_size × average_number_of_sequences)

在列表列表中找到最长递增子序列的最有效方法

2 个答案: