在线性要素列表中组合相邻要素

时间:2017-05-23 21:34:25

标签: python algorithm

Python 3.6

任务:

给出一个线性特征的排序列表(如在线性参考系统中), 合并属于同一个键(linear_feature[0]['key'] == linear_feature[1]['key']linear_feature[0]['end'] == linear_feature[1]['start']

的相邻线性要素

直到组合线性要素具有(end - start) ≥ THRESHOLD

如果功能无法与后续相邻功能组合使用(end - start) ≥ THRESHOLD,请与同一键的上一个相邻功能组合,或返回自我。

编辑:在答案帖子中添加了以下解决方案。

THRESHOLD = 3

linear_features = sorted([
    {'key': 1, 'start': 0, 'end': 2, 'count': 1},
    {'key': 1, 'start': 2, 'end': 4, 'count': 1},
    {'key': 1, 'start': 4, 'end': 5, 'count': 1},
    {'key': 2, 'start': 0, 'end': 3, 'count': 1},
    {'key': 2, 'start': 3, 'end': 4, 'count': 1},
    {'key': 2, 'start': 4, 'end': 5, 'count': 1},
    {'key': 3, 'start': 0, 'end': 1, 'count': 1},
], key=lambda x: (x['key'], x['start']))

# This isn't necessarily an intermediate step, just here for visualization
intermediate = [
    {'key': 1, 'start': 0, 'end': 4, 'count': 2},  # Adjacent features combined
    {'key': 1, 'start': 4, 'end': 5, 'count': 1},  # This can't be made into a feature with (end - start) gte THRESHOLD; combine with previous
    {'key': 2, 'start': 0, 'end': 3, 'count': 1},
    {'key': 2, 'start': 3, 'end': 5, 'count': 2},  # This can't be made into a feature with (end - start) gte THRESHOLD; combine with previous
    {'key': 3, 'start': 0, 'end': 1, 'count': 1},  # This can't be made into a new feature, and there is no previous, so self
]

desired_output = [
    {'key': 1, 'start': 0, 'end': 5, 'count': 3},
    {'key': 2, 'start': 0, 'end': 5, 'count': 3},
    {'key': 3, 'start': 0, 'end': 1, 'count': 1},
]

2 个答案:

答案 0 :(得分:0)

我想出了一个解决方案:

def reducer(x, THRESHOLD):
    x = add_until(x, THRESHOLD)
    if len(x) == 1:
        return x
    if len(x) == 2:
        if length(x[1]) < THRESHOLD:
            x[0]['end'] = x[1]['end']
            x[0]['count'] += x[1]['count']
            return [x[0]]
        else:
            return x
    first, rest = x[0], x[1:]
    return [first] + reducer(rest, THRESHOLD)



def add_until(x, THRESHOLD):
    if len(x) == 1:
        return x
    first, rest = x[0], x[1:]
    if length(first) >= THRESHOLD:
        return [first] + add_until(rest, THRESHOLD)
    else:
        rest[0]['start'] = first['start']
        rest[0]['count'] += first['count']
        return add_until(rest, THRESHOLD)


from itertools import groupby


THRESHOLD = 3

linear_features = sorted([
    {'key': 1, 'start': 0, 'end': 2, 'count': 1},
    {'key': 1, 'start': 2, 'end': 4, 'count': 1},
    {'key': 1, 'start': 4, 'end': 5, 'count': 1},
    {'key': 2, 'start': 0, 'end': 3, 'count': 1},
    {'key': 2, 'start': 3, 'end': 4, 'count': 1},
    {'key': 2, 'start': 4, 'end': 5, 'count': 1},
    {'key': 3, 'start': 0, 'end': 1, 'count': 1},
    {'key': 4, 'start': 0, 'end': 3, 'count': 1},
    {'key': 4, 'start': 3, 'end': 4, 'count': 1},
    {'key': 4, 'start': 4, 'end': 5, 'count': 1},
    {'key': 4, 'start': 5, 'end': 6, 'count': 1},
    {'key': 4, 'start': 6, 'end': 9, 'count': 1},
], key=lambda x: (x['key'], x['start']))

def length(x):
    """x is a dict with a start and end property"""
    return x['end'] - x['start']

results = []

for key, sites in groupby(linear_features, lambda x: x['key']):
    sites = list(sites)
    results += reducer(sites, 3)

print(results)

[
    {'key': 1, 'start': 0, 'end': 5, 'count': 3},
    {'key': 2, 'start': 0, 'end': 5, 'count': 3},
    {'key': 3, 'start': 0, 'end': 1, 'count': 1},
    {'key': 4, 'start': 0, 'end': 3, 'count': 1},
    {'key': 4, 'start': 3, 'end': 6, 'count': 3},
    {'key': 4, 'start': 6, 'end': 9, 'count': 1}
]

答案 1 :(得分:0)

你想要这样的东西:

<强>伪代码

while f=1 < max = count of features:
    if features[f-1]['key'] == features[f]['key'] and
            features[f-1]['end'] == features[f]['start']:
        #combine
        features[f-1]['end'] = features[f]['end']
        features[f-1]['count'] += 1

        del features[f]; max -= 1
    else:
        f += 1