Question

我有一个字符串：

mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"

我想要的是标记start="&maker1"和end="/\n"之间的子字符串列表。因此，预期结果是：

whatIwant = ["The String that I want", "Another string that I want"]

我在这里阅读了答案：

并尝试了此尝试，但未成功

>>> import re
>>> mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
>>> whatIwant = re.search("&marker1(.*)/\n", mystr)
>>> whatIwant.group(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

该如何解决？另外，我的字符串很长

>>> len(myactualstring)
7792818

Answer 1

使用{ "genreId" : 1, "name" : "Comedy", "_links" : { "self" : { "href" : "http://localhost:8080/api/genres/1" }, "genre" : { "href" : "http://localhost:8080/api/genres/1" }, "films" : { "href" : "http://localhost:8080/api/genres/1/films" } } }来考虑此选项：

re.findall

此打印：

mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
matches = re.findall(r'&marker1\n(.*?)\s*/\n', mystr)
print(matches)

以下是正则表达式模式的说明：

['The String that I want', 'Another string that I want']

请注意，&marker1 match a marker \n newline (.*?) match AND capture all content until reaching the first \s* optional whitespace, followed by /\n / and newline将仅捕获re.findall捕获组中显示的内容，这就是您要提取的内容。

Answer 2

该如何解决？ 我会的：

import re
mystr = "&marker1\nThe String that I want /\n&marker1\nAnother string that I want /\n"
found = re.findall(r"\&marker1\n(.*?)/\n", mystr)
print(found)

输出：

['The String that I want ', 'Another string that I want ']

请注意：

&在re模式中具有特殊含义，如果要使用文字，则需要对其进行转义（\&）
.匹配除换行符之外的所有内容

findall

search更适合选择
*?是非贪婪的，在这种情况下.*也可以工作，因为.与换行符不匹配，但在其他情况下，匹配结束可能会超出您的期望
我使用了所谓的raw-string（r前缀）来简化转义

阅读模块re documentation，以讨论原始字符串的用法和具有特殊含义的隐式字符列表。

提取两个标记之间的所有子字符串

2 个答案: