我正在尝试分解一个字符串,其中包含多个具有不同格式的列表。最好的方法是什么?
string = "something here: 1) A i) great ii) awesome 2) B"
another_string = "But sometimes it is different (1) yep (2) not the same i. or this ii. another bullet (3.1) getting difficult huh? 3.1.1 okay i'm done"
理想情况下,我希望能够拆分所有可能的编号或项目符号列表。
所需的字符串输出:
something here: 1) A
i) great
ii) awesome
2) B
another_string的所需输出:
But sometimes it is different (1) yep
(2) not the same
i. or this
ii. another bullet
(3.1) getting difficult huh?
3.1.1 okay i'm done
答案 0 :(得分:1)
您可以将re.split
与以下正则表达式(从paxdiablo借来的罗马数字正则表达式)一起使用,以分割输入字符串,然后将其与迭代器连接起来:
import re
def split(s):
i = iter(re.split(r'(\(?\d+(?:\.\d+)+\)?|\(?\d+\)|\(?\b(?=M|(?:CM|CD|D?C)|(?:XC|XL|L?X)|(?:IX|IV|V?I))M{0,4}(?:CM|CD|D?C{0,3})(?:XC|XL|L?X{0,3})(?:IX|IV|V?I{0,3})[.)])', s, flags=re.IGNORECASE))
return next(i) + '\n'.join(map(''.join, zip(i, i)))
以使您的示例输入:
split(string)
将返回:
something here: 1) A
i) great
ii) awesome
2) B
和:
split(another_string)
将返回:
But sometimes it is different (1) yep
(2) not the same
i. or this
ii. another bullet
(3.1) getting difficult huh?
3.1.1 okay i'm done