Question

如何将所有字符串捕获到具有起始字符和结束字符的列表中？

这是我尝试过的：

COPY

这给出了：

import re

sequence = "This is start #\n hello word #\n #\n my code#\n this is end"

query = '#\n'
r = re.compile(query)
findall = re.findall(query,sequence)
print(findall)

寻找如下输出：

['#\n', '#\n', '#\n', '#\n']

Answer 1

简单的split()就足够了：

sequence = "This is start #\n hello word #\n #\n my code#\n this is end"

parts = sequence.split("#\n")[1:-1]  # discard 1st and last because it is not between #\n

print(parts)

这会给你（第一部分和最后一部分会被丢弃，因为它不在之间 '#\n'：

[' hello word ', ' ', ' my code'] # ' ' is strictly also between two #\n

您可以清理：

# remove spaces and "empty" hits if it is only whitespace
mod_parts = [p.strip() for p in parts if p.strip()]

print(mod_parts)

前往：

['hello word', 'my code']

或简而言之：

shorter = [x.strip() for x in sequence.split("#\n")[1:-1]]

Answer 2

在这种情况下，最好只使用字符串函数.split()并将其传递给#\n作为要分割的内容。您可以使用s.strip()检查长度并过滤出空行。如果由于某些原因您不希望第一部分和最后一部分，可以使用切片[1:-1]删除它们。

sequence = "This is start #\n hello word #\n #\n my code#\n this is end"
print(sequence.split("#\n"))
# ['This is start ', ' hello word ', ' ', ' my code', ' this is end']

print([s.strip() for s in sequence.split("#\n") if s.strip()])
# ['This is start', 'hello word', 'my code', 'this is end']

print([s.strip() for s in sequence.split("#\n") if s.strip()][1:-1])
# ['hello word', 'my code']

Answer 3

尝试：

print(re.findall("#\n(.*?)#\n", sequence))

regex捕获（非贪婪地）捕获两个'#\n'之间的任何东西，但是永远不要将其重新用于下一个捕获。但是，如果您希望将其用作分隔符（例如split()，则可以尝试使用超前：

print(re.findall("#\n(.*?)(?=#\n)", sequence))

，在这种情况下，输出将是

[' hello word ', ' ', ' my code']

Answer 4

就像Brian所建议的那样，您可以使用split函数。但是，如果您考虑诸如括号之类的开始和结束模式，则找到标记的正确方法是：

print([s.strip() for s in sequence.split("#\n")][1:-1:2])

它只是跳过结尾到其下一个起点之间的字符串。例如，如果输入为

sequence = "This is start #\n hello word #\n BETWEEN END1 AND START2 #\n my code#\n this is end"

BETWEEN END1 AND START2一词不应被捕获；因此，正确的输出是：

['hello word', 'my code']

Answer 5

您可以使用

import re
rx = re.compile(r'#\n([\s\S]+?)#\n')

text = """This is start #
 hello word #
 #
 my code#
 this is end"""

matches = rx.findall(text)
print(matches)

如

[' hello word ', ' my code']

这产生

{{1}}

请参见a demo for the expression on regex101.com。

Python-捕获字符串开头和结尾之间的所有字符串

5 个答案: