匹配各种长度的刺痛

时间:2017-07-23 23:02:29

标签: python regex

我有一个文本块,有多个段落由不同长度的虚线分开。我想使用python匹配段落之间的界限。我的要求如下:

  1. 匹配仅包含各种长度的虚线的行
  2. 排除包含破折号和任何其他字符的行
  3. 以下是示例文本块:

    Believing neglected so so allowance existence departure in.
    In design active temper be uneasy. Thirty for remove plenty 
    regard you summer though. He preference connection astonished 
    on of yet. ------ Partiality on or continuing in particular principles as. 
    Do believing oh disposing to supported allowance we.
    -------
    Admiration we surrounded possession frequently he. 
    Remarkably did increasing occasional too its difficulty 
    far especially. Known tiled but sorry joy balls. Bed sudden 
    
    manner indeed fat now feebly. Face do with in need of 
    wife paid that be. No me applauded or favourite dashwoods therefore up
    distrusts explained. 
    ----t--
    ------
    And produce say the ten moments parties. Simple innate summer 
    fat appear basket his desire joy. Outward clothes promise at gravity 
    do excited. 
    Sufficient particular impossible by reasonable oh expression is. Yet 
    preference 
    connection unpleasant yet melancholy but end appearance. And 
    excellence partiality 
    estimating terminated day everything. 
    ---------    
    

    我尝试了以下内容:

    r"-*.-"g or (.*?)-+
    

    但是,我匹配包含两个或更多破折号的所有行,包括那些容器其他字符的行。

2 个答案:

答案 0 :(得分:1)

只需r"^[-]+$"即可。只需记住为MULTILINE^指定$模式,以分别匹配行的开头和行的结尾,而不仅仅是整个字符串的开头和结尾。

实际上最后一行不匹配,因为它最后有空格。如果您在破折号后允许空格,则可以使用r"^[-]+[ ]*$"

另一件事 - 如果您还想只匹配段落之间的行而不是最后一行,则可以使用r"^[-]+[ ]*$[^\Z]"

编辑:取自@ sln的评论,这里有一些我忘了的细微差别:

  1. 您可以在模式开头使用MULTILINE设置(?m)标记
  2. 字符类[^\S\r\n]匹配除换行符之外的所有空格。您可以使用它而不是[ ],它只匹配空格。

答案 1 :(得分:0)

r'^[^-]*$'

将匹配任何不包含-

的行

你还需要使用多行标志来解析多行输入re.M

在此处查看结果: https://regex101.com/r/iRkPep/1