Question

所以我有一个文本文件，其中有几个显示字符串＆＃39;继续阅读主要故事＆＃39;。让我们说文本如下所示：

第1部分

继续阅读主要故事

第2部分

继续阅读主要故事

第3部分

继续阅读主要故事

继续阅读主要故事

第4部分

我想要的是part2和part3，如下所示：

第2部分

继续阅读主要故事

第3部分

因为它是在第一次出现之后继续阅读主要故事＆＃39;以及它的最后一次出现。现在我想到使用以下代码：

my_regex = re.compile("(Continue reading the main story)"+
                   ".*"+ # match as many chars as possible
                   "(Continue reading the main story)",
                   re.DOTALL)
new_str = my_regex.sub("\1\2", text)

然而它不起作用。如何纠正？

Answer 1

试试以下正则表达式。我使用了lookbehind和lookahead功能：

rx = "(?<=part 1\n{2}Continue reading the main story).*(?=Continue reading the main story[\r\n]+part 4)"

for match in re.finditer(rx, text, re.IGNORECASE | re.DOTALL | re.MULTILINE):
    print(match.group().strip())

根据您提供的文字，它会打印

part 2

Continue reading the main story

part 3

Continue reading the main story

Answer 2

如果您知道自己的文字不是以＆＃34开头，那么继续......＆＃34;并且不会以＆＃34;继续...＆＃34;结束，你可以分开＆＃34;继续......＆＃34;字符串，删除第一个，最后一个和空的项目，你将留下你想要的东西。

- (void)scrollViewDidScroll:(UIScrollView *)scrollView {
    CGPoint currentOffset = scrollView.contentOffset;
    self.scrollingUpward = currentOffset.y > self.lastOffset.y;
    self.lastOffset = currentOffset;
}

结果

import re
text = """\
part 1

Continue reading the main story

part 2

Continue reading the main story

part 3

Continue reading the main story

Continue reading the main story

part 4
"""

parts = re.split('Continue reading the main story', text)
print(parts)
# Ignore first and last part, test for and ignore
# empty (all whitespace) strings
innerparts = [part for part in parts[1:-1] if part.strip()]
print("".join(innerparts))

（还有很多新行，因为这是输入的结果。如果你想摆脱它，你可以使用part 2 part 3。）

Answer 3

一个简单的re.findall（）就可以了。

rgx = r'Continue reading the main story(.*)Continue reading the main story'
match = re.findall(rgx, text, re.DOTALL)
if match:
    result = match[0].strip()
    print(result)

根据您给定的文字，这将打印

part 2

Continue reading the main story

part 3

Continue reading the main story

Answer 4

还尝试以下模式：

import re
s = """
    part 1

    Continue reading the main story

    part 2

    Continue reading the main story

    part 3

    Continue reading the main story

    Continue reading the main story

    part 4
    """
print re.findall('(?:\s+Continue reading the main story\s\n)([\s\S]*?)(?:\n\s+Continue reading the main story\s){2}', s)[0]

输出：

part 2

Continue reading the main story

part 3

python正则表达式保留第一个和最后一个术语之间的界限

4 个答案: