特定数据集的Python数据操作

时间:2014-09-16 09:06:52

标签: python

我在解决更大的问题时遇到了这个子问题。

我有一张这样的表:

start end   

313  516
517  1878
1879 2155
3649 3669
3670 5024
5034 6968

我的输出应该是:

313  2155
3649 5024
5034 6968

通过合并连续的数据集获得此输出,即此处1878和1879是连续的,依此类推。

我试着这样做

i = 0
if start[i+1] == end[i]+1 :
    table.append(start[i],end[i+1])

打印:

313 1878
517 2155

依旧......

正如预期的那样,它适用于1行级别。我想让它适用于任何行级别。

3 个答案:

答案 0 :(得分:2)

这对reduce工作来说非常完美:

def squash(lsts, el):
    if not lsts:
        return [list(el)]
    if lsts[-1][1] == el[0] - 1:
        lsts[-1][1] = el[1]
    else:
        lsts.append(list(el))
    return lsts

print reduce(squash, zip(start, end), [])

输出

[[313, 2155], [3649, 5024], [5034, 6968]]

答案 1 :(得分:1)

这个怎么样:

def consolidate(start, end):
    _start = start[:]                # Make a copy since we're modifying the list
    result = []
    for i in range(len(_start)-1):   # Iterate until the second-to-last pair
        if _start[i+1] == end[i]+1:  # If two pairs are contiguous,
            _start[i+1] = _start[i]  # replace the start value with the previous one
        else:                                  # Otherwise
            result.append((_start[i], end[i])) # add the current pair to the result
    result.append((_start[i+1], end[i+1]))     # Don't forget the ultimate pair
    return result

结果:

>>> start = [313,517,1879,3649,3670,5034,6969]
>>> end = [516,1878,2155,3669,5024,6968,7000]
>>> consolidate(start,end)
[(313, 2155), (3649, 5024), (5034, 7000)]

答案 2 :(得分:1)

使用zip迭代这两个列表:

def func(start, end):
    result = []
    first = start[0]

    for i, j in zip(start[1:], end):
        if i == j + 1:
           continue
        result.append((first, j))
        first = i

    result.append((first, end[-1]))
    return result

执行示例:

In [73]: start = [313, 517, 1879, 3649, 3670, 5034, 6969]

In [74]: end = [516, 1878, 2155, 3669, 5024, 6968, 7000]

In [75]: func(start, end)
Out[75]: [(313, 2155), (3649, 5024), (5034, 7000)]