Question

我们遇到了一些降价内容的问题。我们使用的一些jquery编辑器没有编写正确的markdown语法。 Embedded Links使用'label'格式，它会删除文档底部的链接（就像StackOverflow编辑器一样）。我们遇到的问题是链接有时以非标准方式格式化。虽然允许它们以0,3个空格作为前缀，但有些空格有4个空格（你可能会注意到StackOverflow在javascript中强制使用2个空格） - 这会在markdown解析器中将其作为preformatted text触发。

作为一个简单的例子：

This is a sample doucument that would have inline links. 
[Example 0][0], [Example 1][1], [Example 2][2] , [Example 3][3] , [Example 4][4]

[0]: http://example.com
 [1]:      http://example.com/1
  [2] : http://example.com/2
   [3]: http://example.com/3
    [4]  : http://example.com/4

我想将最后一部分重新格式化为正确的降价：

[0]: http://example.com
[1]: http://example.com/1
[2]: http://example.com/2
[3]: http://example.com/3
[4]: http://example.com/4

我正在试图想出正确的正则表达式来抓住'标签'部分。我可以很好地抓住该部分内的标签 - 但该部分是我的意思。

这是我到目前为止所拥有的：

RE_footnote = re.compile("""
    (?P<labels_section>
        ^[\t\ ]*$                             ## we must start with an empty line
        \s+                       
        (?P<labels>
            (?P<a_label>
                ^
                    [\ \t]*                     ## we could have 0-n spaces or tabs
                    \[                          ## BRACKET - open
                        (?P<id>
                            [^^\]]+
                        )
                    \]                          ## BRACKET - close
                    \s*
                    :                           ## COLON
                    \s*
                    (?P<link>                   ## WE want anything here
                        [^$]+
                    )
                $
            )+                                  ## multiple labels
        )
    )
""",re.VERBOSE|re.I|re.M)

我遇到的具体问题：

我无法弄清楚如何允许1个或更多“空行”。这会触发无效的正则表达式而无需重复：

（？：##将其包装在非捕获组中，需要1次以上的出现 ^ [\ t \] * $
）+
如果没有组\s+之前的空格匹配，匹配将无效。我无法弄清楚是什么/为什么。
我希望这只匹配文档的END，以确保我们只修复这些javascript错误（而不是文档的核心）。我在\z工作的所有尝试都失败了，惨不忍睹。

有人可以提供一些建议吗？

更新

这有效：

RE_MARKDOWN_footnote = re.compile("""
    (?P<labels_section>
        (?:                            ## we must start with an empty / whitepace-only line
            ^\s*$
        )                              
        \s*                             ## there can be more whitespace lines
        (?P<labels>
            (?P<a_label>
                ^
                    [\ \t]*                     ## we could have 0-n spaces or tabs
                    \[                          ## BRACKET - open
                        (?P<id>
                            [^^\]]+
                        )
                    \]                          ## BRACKET - close
                    \s*
                    :                           ## COLON
                    \s*
                    (?P<link>                   ## WE want anything here
                        [^$]+
                    )
                $
            )+                                  ## multiple labels
        )
        \s*                                     ## we might have some empty lines 
        \Z                                      ## ensure the end of document
    )
""",re.VERBOSE|re.I|re.M)

Answer 1

我刚从头开始;有这么简单的事情是不行的呢？

^\s*                # beginning of the line; may include whitespace
  \[                # opening bracket
     (?P<id>\d+)    # our ID
  \]                # closing bracket
\s*                 # optional whitespace
  :                 # colon
\s*                 # optional whitespace
  (?P<link>[^\n]+)  # our link is everything up to a new line
$                   # end of the line

这是使用全局和多行修饰符gm完成的。将匹配替换为：[\id]: \link。这是一个工作示例：http://regex101.com/r/mM8dI2

使用正则表达式来修复降价输入 - 链接标签

1 个答案: