获得有序的子串

时间:2017-06-17 21:39:59

标签: string python-3.x

如果我有一个清单" abbabaabbaba"我想找到一些子串的有序出现" ab"," bb",我可以这样做多个.find()调用:

def foo(string, substrings):
    tuples = []
    for substring in substrings:
        string_copy = string
        while string_copy.find(substring) != -1:
            index = string_copy.find(substring)
            string_copy = string_copy[index:]
            tuples.append((index, substring))
    return sorted(tuples)

但可能有更短的方式吗?类似的东西:

def bar(string, substring):
    return ((index, substring) for substring in string.find(substring) if index != -1)

(但有效)

例:
foo("abbabaabbaba", ["ab", "bb])
>>> [(0, "ab"), (1, "bb"), (3, "ab"), (6, "ab"), (7, "bb"), (9, "ab")]

2 个答案:

答案 0 :(得分:1)

您可以像这个例子一样使用list comprehensionstring slicing

def get_occurrence(a, args, step=2):
    return [(k, a[k:k+step]) for k in range(len(a)) if a[k:k+step] in args]

a = "abbabaabbaba"
occurrences = get_occurrence(a, ['ab', 'bb'])
print(occurrences)

输出:

[(0, 'ab'), (1, 'bb'), (3, 'ab'), (6, 'ab'), (7, 'bb'), (9, 'ab')]

答案 1 :(得分:0)

我可以使用hot new regex library吗?

import regex

def foo(string, substrings):
    pattern = '(' + '|'.join(regex.escape(s) for s in substrings) + ')'
    return [(match.start(0), match.groups(1)[0])
            for match in regex.finditer(pattern, string, overlapped=True)]

foo("abbabaabbaba", ["ab", "bb"])
# -> [(0, 'ab'), (1, 'bb'), (3, 'ab'), (6, 'ab'), (7, 'bb'), (9, 'ab')]