Question

我想分割一个字符串，如下所示：

string = "Lines: 67 \n\nThis is an example"

请注意，“ 67”仅是示例，有时我们会有“ 315”或“ 6666”，这是不确定的数字位数。我知道我们可以使用正则表达式，但我不知道如何应用它。

另一个说明：有时字符串也可以是类似的东西。

"Lines: 6777       \nThis is an example"

string = string.split("Lines:")

当前结果：

["", " 67 \n\nThis is an example"]

预期结果：

["Lines: 67", " \n\nThis is an example"] #If possible I wish the string[1] to have no front space. So maybe I can use ".strip"?

Answer 1

也许这取决于您期望的模式？如果这是您期望的模式，则可以在数字和字符串的其余部分之间分隔空格：

s="Lines: 67 \n\nThis is an example"
m=re.match('(^Lines: \d+)\s+(.*$)', s)
print(m.groups())
# ('Lines: 67', 'This is an example')

使用字符串拆分：

s="Lines: 67 \n\nThis is an example"
s.split(' \n\n', 1)
# ['Lines: 67', 'This is an example']

或正则表达式拆分：

s="Lines: 67 \n\nThis is an example"
re.split(' \s+', s, maxsplit=1)
# ['Lines: 67', 'This is an example']

Answer 2

假设您始终要删除行的“行：数字”部分，则可以通过以下方式使用正则表达式：

>>> import re
>>> strings = "Lines: 67 \n\nThis is an example"
>>> result = re.search("(Lines: \d+)([\s\S]+)", strings)
>>> result[1]
'Lines: 67'
>>> result[2]
' \n\nThis is an example'
>>>

我们可以像这样分解正则表达式模式(Lines: \d+)([\s\S]*)：

我们想要两个捕获组，一个捕获组代表“行：数字”部分，一个捕获组用于其余字符串：(capturegroup1)(capturegroup2)

Lines:将匹配开始部分，而\d将得到任何数字。 +将发现前一个\d的一个或多个实例。

这使我们成为第一个捕获组(Lines: \d+)

接下来，我们需要其余的字符串，其中包括\n个字符，以便我们可以查找与\s（一个空格字符）以及任何\S（一个非空格）匹配的内容字符。要查找其中之一，我们将其加入集合中：[\s\S]，并使用*查找该集合中出现的任何数量。

您可以使用https://regexr.com/之类的工具来帮助您解决更多情况。

Answer 3

如果您希望不使用正则表达式就这样做：

string = "Lines: 67 \n\nThis is an example"
strlist = string.split()
firstresult = strlist[0] + ' ' + strlist[1]
secondresult = string.split(firstresult)[1].strip(' ')
output = [firstresult, secondresult]
print (output)
>>> ["Lines: 67", "\n\nThis is an example"]

如果您要删除\n：

secondresult = string.split(firstresult)[1].strip()
output = [firstresult, secondresult]
print (output)
>>> ["Lines: 67", "This is an example"]

根据某些规则拆分新闻消息

3 个答案: