我有以下示例文本:
mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\begin{itemize}
\item Item 1
\item Item 2
\end{itemize}
\end{document}'''
我想要完成的是创建一个正则表达式,它将从mystr
中提取以下行:
['This is introduction paragraph','This is non-introduction paragraph',' This is sample section paragraph\n \begin{itemize}\n\item Item 1\n\item Item 2\n\end{itemize}']
答案 0 :(得分:2)
出于任何原因,您需要使用正则表达式。也许分裂字符串比仅仅" a"更多地涉及。 re
模块也具有拆分功能:
import re
str_ = "a quick brown fox jumps over a lazy dog than a quick elephant"
print(re.split(r'\s?\ba\b\s?',str_))
# ['', 'quick brown fox jumps over', 'lazy dog than', 'quick elephant']
编辑:使用您提供的新信息扩大回答...
编辑后你写了一个更好的问题描述并且你包含了一个看起来像LaTeX的文本,我认为你需要提取那些不以\
开头的行,这些是乳胶命令。换句话说,您需要只有文本的行。请尝试以下操作,始终使用正则表达式:
import re
mystr = r'''\documentclass[12pt]{article}
\usepackage{amsmath}
\title{\LaTeX}
\begin{document}
\section{Introduction}
This is introduction paragraph
\section{Non-Introduction}
This is non-introduction paragraph
\section{Sample section}
This is sample section paragraph
\end{document}'''
pattern = r"^[^\\]*\n"
matches = re.findall(pattern, mystr, flags=re.M)
print(matches)
# ['This is introduction paragraph\n', 'This is non-introduction paragraph\n', 'This is sample section paragraph\n']
答案 1 :(得分:0)
您可以使用split
中的str
方法:
my_string = "a quick brown fox jumps over a lazy dog than a quick elephant"
word = "a "
my_string.split(word)
结果:
['', 'quick brown fox jumps over ', 'lazy dog than ', 'quick elephant']
注意:不要将str
用作变量名,因为它是Python中的关键字。