从字符串

时间:2016-08-01 23:45:50

标签: python optimization

很想知道人们是否可以比我的实现更快(使用纯python,或者其他什么,但只是为了你)。

sentence = "This is some example sentence where we remove parts"
matches = [(5, 10), (13, 18), (22, 27), (38, 42)]

目标是在这些范围内删除。例如。 index(5,6,7,8,9)中的字符应该在匹配的返回值(5,10)中省略。

我的实施:

def remove_matches(sentence, matches):
    new_s = ''
    lbound = 0
    for l, h in matches:
        news += sentence[lbound:l]
        lbound = h
    new_s += sentence[matches[-1][1]:]
    return new_s

结果:'This me le sce where weove parts'

请注意,匹配永远不会重叠,您可以利用这一事实。

实际上,我的主要问题很简单:我们不能以某种矢量化的方式做某事吗?我确信numpy可以,但我怀疑在这种情况下会更有效率。

基准:

PascalvKooten:           1000000 loops, best of 3: 1.34 µs per loop
Ted Klein Bergman (1):   1000000 loops, best of 3: 1.59 µs per loop
Ted Klein Bergman (2):    100000 loops, best of 3: 2.58 µs per loop 
Prune:                    100000 loops, best of 3: 2.05 µs per loop
njzk2:                    100000 loops, best of 3: 3.19 µs per loop

5 个答案:

答案 0 :(得分:1)

这可能会更快。它基本上是您的解决方案,但使用列表而不是字符串。由于列表是可变的并且不需要在每个循环中创建,因此它应该更快(可能不是为了这么少的匹配)。

sentence = "This is some example sentence where we remove parts"
matches = [(5, 10), (13, 18), (22, 27), (38, 42)]

def remove_matches(sentence, matches):
    result = []
    i = 0
    for x, y in matches:
        result.append(sentence[i:x])
        i = y
    result.append(sentence[i:])

    return "".join(result)

此方法可能更快:

def remove_matches(sentence, matches):
    return "".join(
        [sentence[0:matches[i][0]] if i == 0 else 
         sentence[matches[i - 1][1]:matches[i][0]] if i != len(matches) else 
         sentence[matches[i - 1][1]::] for i in range(len(matches) + 1)
         ])

答案 1 :(得分:0)

shorthend =sentence[:matches[0][0]]+ "".join([sentence[matches[i-1][1]:matches[0][0] for i in range(1, len(matches)]) + sentence[matches[len(matches)]:]

因为我'在我的手机上,我无法调试,但它应该工作:D

答案 2 :(得分:0)

如果你将(null,0)追加到前面并且(-1,null)追加到匹配的后面

sentence = "This is some example sentence where we remove parts"
matches = [(null, 0), 
           (5, 10), (13, 18), (22, 27), (38, 42), 
           (len(sentence), null)]

然后您可以根据

编写连接表达式
matches[i][1]:matches[i+1][0] for i in range(len(matches)-1)

这足以让你感动吗?

答案 3 :(得分:0)

如果字符串是可变的,那么通过连续的子字符串就地移动字符就可以实现快速解决方案。

最佳C解决方案包括几个memmov调用。

答案 4 :(得分:0)

我不会删除字符,而是定义如何保留它们,以使操作更容易:

sentence = "This is some example sentence where we remove parts"
matches = [(5, 10), (13, 18), (22, 27), (38, 42)]
chain = (None,) + sum(matches, ()) + (None,)
# 
keep = ((m1, m2) for m1, m2 in zip(chain[::2], chain[1::2]))
# list(keep) = [(None, 5), (10, 13), (18, 22), (27, 38), (42, None)]
# or, keep = ((m1[1], m2[0]) for m1, m2 in zip([(None, None)] + matches, matches + [(None, None)]))
return ''.join(sentence[x:y] for x, y in keep)