排除在python中创建新间隔的位置

时间:2014-03-22 00:22:11

标签: python list

假设我有一根长度为0-5000的绳子。我想把这条绳子分开,以便切断下面列出的列表中的间隔,然后返回其余部分:

我的名单:

['HE670029', '4095', '4096']
['HE670029', '4098', '4099']
['HE670029', '4102', '4102']

所需的输出(不必是列表,可以在新行上写入每个list的文件):

['HE670029', '0', '4094']
['HE670029', '4097', '4097']
['HE670029', '4100', '4101']
['HE670029', '4103', '5000']

我试过操纵字典,但没有成功。我不知道如何将其转换为允许我执行所需操作的格式。

3 个答案:

答案 0 :(得分:1)

它不漂亮,但它有效:

sections_to_cut = [
        ['HE670029', '4095', '4096'],
        ['HE670029', '4098', '4099'],
        ['HE670029', '4102', '4102']
    ]

ropes = {}
for rope in sections_to_cut:
    if rope[0] not in ropes: # could use default dict instead
        ropes[rope[0]] = []
    ropes[rope[0]].append((int(rope[1]), int(rope[2])))

cut_ropes = []

for rope_name, exclude_values in ropes.items():
    sorted_ex = sorted(exclude_values, key=lambda x: x[0])
    a = 0
    for i in sorted_ex:
        cut_ropes.append([rope_name, str(a), str(i[0]-1)])
        a = i[1] + 1
    cut_ropes.append([rope_name, str(a), str(5000)])

print(cut_ropes)
# [['HE670029', '0', '4094'], ['HE670029', '4097', '4097'], ['HE670029', '4100', '4101'], ['HE670029', '4103', '5000']]

答案 1 :(得分:0)

在我看到您的间隔不能重叠之前,我开始写这个。这种方法有点矫枉过正,但是我会把它放弃,因为丢掉它似乎很浪费。

有关简短解决方案,请参阅底部。

OOP-ish做事的方式:

class Interval:
    def __init__(self,left,right):
        self.left = int(left)
        self.right = int(right)
    def __contains__(self,x):
        return self.left <= int(x) <= self.right

intervals = [['HE670029', '4095', '4096'],
['HE670029', '4098', '4099'],
['HE670029', '4102', '4102']]

#if intervals aren't sorted, then do:
#cuts = [Interval(*x[1:]) for x in sorted(intervals,key=lambda i: i[1])]
cuts = [Interval(*x[1:]) for x in intervals]

#this step is overkill, since we know our intervals can't overlap
breakpoints = [x for x in range(1,5000) if any(x in cut for cut in cuts)]

def gen_segments(breakpoints, id_='HE670029', start=0, end=5000 ):
    for pair in chunks(breakpoints,2):
        if len(pair) < 2: #last breakpoint may be singleton
            pair += pair
        left,right = pair
        yield id_, start, left-1
        start = right+1
    yield id_, start, end

chunksthis页面上的几个块食谱之一。演示:

list(gen_segments(breakpoints))
Out[258]: 
[('HE670029', 0, 4094),
 ('HE670029', 4097, 4097),
 ('HE670029', 4100, 4101),
 ('HE670029', 4103, 5000)]

像我说的那样,上面的内容太过分了。如果您知道您的间隔不重叠,则不需要花哨的Interval类或任何其他内容。就这样做:

breakpoints = [int(x) for interval in intervals for x in interval[1:]]

然后直接使用上面的gen_segments

答案 2 :(得分:0)

我不会为你破坏它,但给你一个暗示。给定

xs = [
    ['HE670029', '4095', '4096'],
    ['HE670029', '4098', '4099'],
    ['HE670029', '4102', '4102']]

第一部分和最后一部分很容易做到。只是0->第一个节点,然后最后一个节点是5000.你需要临时值......

首先创建可以提取绳子两端值的函数:

def head(x): return int(x[1])
def last(x): return int(x[-1])

现在您需要像以下那样对每个后续行进行细分:

[a,b for a,b in zip(xs[:-1], xs[1:])]

既然你拥有这些值,你可以继续使用你刚创建的函数来提取每个函数的最后和第一个值...

[(last(a),head(b)) for (a,b) in zip(xs[:-1], xs[1:])]

这些不是你想要的价值吗?你需要在这里转移......

[(last(a)+1,head(b)-1) for (a,b) in zip(xs[:-1], xs[1:])]

最后,只需将右侧列表放入:

xM = [['HE670029', str(last(a)+1),str(head(b)-1)] for (a,b) in zip(xs[:-1], xs[1:])]    

现在您有2个列表。 xsxM。我相信你可以循环并将它们组合在一起......如果你想改善结果,Ypu可以考虑使用ziplistconcat