Question

我有以下示例文本：

mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\begin{itemize}
  \item Item 1
  \item Item 2
\end{itemize}
\end{document}'''

我想要完成的是创建一个正则表达式，它将从mystr中提取以下行：

['This is introduction paragraph','This is non-introduction paragraph','    This is sample section paragraph\n \begin{itemize}\n\item Item 1\n\item Item 2\n\end{itemize}']

Answer 1

出于任何原因，您需要使用正则表达式。也许分裂字符串比仅仅＆＃34; a＆＃34;更多地涉及。 re模块也具有拆分功能：

import re
str_ = "a quick brown fox jumps over a lazy dog than a quick elephant"


print(re.split(r'\s?\ba\b\s?',str_))

# ['', 'quick brown fox jumps over', 'lazy dog than', 'quick elephant']

编辑：使用您提供的新信息扩大回答...

编辑后你写了一个更好的问题描述并且你包含了一个看起来像LaTeX的文本，我认为你需要提取那些不以\开头的行，这些是乳胶命令。换句话说，您需要只有文本的行。请尝试以下操作，始终使用正则表达式：

import re

mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\end{document}'''

pattern = r"^[^\\]*\n"


matches = re.findall(pattern, mystr, flags=re.M)

print(matches)

# ['This is introduction paragraph\n', 'This is non-introduction paragraph\n', 'This is sample section paragraph\n']

Answer 2

您可以使用split中的str方法：

my_string = "a quick brown fox jumps over a lazy dog than a quick elephant"
word = "a "
my_string.split(word)

结果：

['', 'quick brown fox jumps over ', 'lazy dog than ', 'quick elephant']

注意：不要将str用作变量名，因为它是Python中的关键字。

如何在单词和下一个单词之间提取文本？

2 个答案: