在带有多个分隔符的python中拆分字符串的最佳方法-同时保留分隔符

时间:2019-05-15 12:12:44

标签: python regex string split substring

假设我有字符串:

string = "this is a test string <LW> I want to <NL>split this string<NL> by each tag I have inserted.<AB>"

我想通过在上一个函数中插入字符串的每个自定义标签分割字符串:

tags = ["<LW>", "<NL>", "<AB>"]

这是所需的输出:

splitString = splitByTags(string, tags)

for s in splitString:
    print(s)

输出

"this is a test string <LW>"
" I want to <NL>"
"split this string<NL>"
" by each tag I have inserted.<AB>"

所以基本上我想将字符串分成多个子字符串,同时将这些子字符串保留在拆分中。最快,最有效的方法是什么?我知道我可以使用string.split并将简单的拆分文本追加到每一行,但是我不确定如何使用多个字符串。

2 个答案:

答案 0 :(得分:3)

ng test MyOtherModule与捕获括号一起使用。

例如:

re.split

输出:

import re
string = "this is a test string <LW> I want to <NL>split this string<NL> by each tag I have inserted.<AB>"
tags = ["<LW>", "<NL>", "<AB>"]

splt_str = re.split("(" + "|".join(tags) + ")", string)

for i in range(0, len(splt_str), 2):
    print("".join(splt_str[i:i+2]))

答案 1 :(得分:0)

以下是如何执行此操作的示例:

import re

def split_string(string, tags):
    string_list = []
    start = 0
    for tag in tags:
        tag_index = re.finditer(tag, string)
        for item in tag_index:
            end_tag = item.start() + len(tag)
            string_list.append(string[start:end_tag])
            start = end_tag

    return string_list



data = split_string(string, tags)

输出:

['this is a test string <LW>', ' I want to <NL>', 'split this string<NL>', ' by each tag I have inserted.<AB>']